SlideShare a Scribd company logo
Phylogenomics and the Diversity
 and Diversification of Microbes

         Jonathan A. Eisen
            UC Davis

            UCSF Talk
         February 17, 2011
Phylogenomics of Novelty
Phylogenomics of Novelty


 Mechanisms of
 Origin of New
   Functions
Phylogenomics of Novelty


 Mechanisms of     Variation in
 Origin of New    Mechanisms:
   Functions     Patterns, Causes
                   and Effects
Phylogenomics of Novelty


 Mechanisms of         Variation in
 Origin of New        Mechanisms:
   Functions         Patterns, Causes
                       and Effects




          Species Evolution
Phylogenomics of Novelty



                                      Variation in
Mechanisms of
                                     Mechanisms:
Origin of New
                                    Patterns, Causes
  Functions
                                      and Effects




                Species Evolution
Outline

• Introduction
• Phylogenomic Stories
  –   Within genome invention of novelty
  –   Stealing novelty
  –   Communities of microbes
  –   Community service and knowing what we don’t
      know
Introduction
rRNA Tree of Life




 FIgure from Barton, Eisen et al.
    “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Limited Sampling of RRR Studies




        FIgure from Barton, Eisen et al.
           “Evolution”, CSHL Press.
       Based on tree from Pace NR, 2003.
Limited Sampling of RRR Studies
                                                  Haloferax

                                                  Methanococcus
Chlorobium
Deinococcus
Thermotoga




               FIgure from Barton, Eisen et al.
                  “Evolution”, CSHL Press.
              Based on tree from Pace NR, 2003.
UV Survival E.coli vs H.volcanii
                1
                        Ecoli vs. Hvolcanii
              0.1


             0.01


Relative    0.001
Survival

           0.0001


           1E-05


           1E-06


           1E-07
                    0   50    100    150       200        250    300   350   400
                                             UV J/m2
                                           E.coli NR10121 mfd-

                                           E.coli NR10125 mfd+


       TIGR                                H.volcanii WFD11
H. volcanii UV Repair Label 7 - 45J / m2)



0.6
                                    Label5#2
                        0 J/m2 t0
                        45 J/m2 t0
                        45 J/m2 Photoreac.
                        45 J/m2 Dark 24 Hours

0.4




0.2




  0
      0   2000   4000         6000       8000    10000      12000   14000   16000   18000

                                Avg. Mol. Wt.(Base Pairs)
Fleischmann et al.
1995
TIGR Genome Projects
                                                    Haloferax

                                                    Methanococcus
Chlorobium
Deinococcus
Thermotoga




                 FIgure from Barton, Eisen et al.
                    “Evolution”, CSHL Press.
                Based on tree from Pace NR, 2003.
From http://genomesonline.org
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Human commensals
From http://genomesonline.org
Phylogenomics of Novelty I

  Origin of Functions from Within
From Eisen et al.
1997 Nature
Medicine 3:
1076-1078.
Blast Search of H. pylori “MutS”




• Blast search pulls up Syn. sp MutS#2 with much higher p
  value than other MutS homologs
• Based on this TIGR predicted this species had mismatch
  repair
• Assumes functional constancy
                   Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
Predicting Function
• Identification of motifs
   – Short regions of sequence similarity that are indicative of
     general activity
   – e.g., ATP binding
• Homology/similarity based methods
   – Gene sequence is searched against a databases of other
     sequences
   – If significant similar genes are found, their functional
     information is used
• Problem
   – Genes frequently have similarity to hundreds of motifs
     and multiple genes, not all with the same function
MutL??




Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
Overlaying Functions onto Tree
                                                         MutS2
                                           Aquae
                           MSH5                Strpy
                                                   Bacsu
                                                       Synsp
                                                         Deira Helpy
                            Yeast
                      Human                                Borbu       Metth
                      Celeg


   MSH6                                                            mSaco
              Yeast
            Human
            Mouse
             Arath
                                                                       Yeast MSH4
                                                                        Celeg
                                                                       Human
            Arath
         Human
MSH3     Mouse
                                                                      Fly
       Spombe
          Yeast                                                     Xenla
                                                                    Rat
                                                                     Mouse
          Yeast                                                     Human
MSH1   Spombe                                                       Yeast       MSH2
                                                                   Neucr
                                                                  Arath


                         Aquae                          Trepa
                         Chltr
                           DeiraTheaq
                                                     BacsuBorbu
                                   Thema
                                              SynspStrpy                         Based on Eisen,
                                    Ecoli
                                          Neigo
                                                                                 1998 Nucl Acids
                                      MutS1                                      Res 26: 4291-4300.
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Evolutionary Functional Prediction
                   EXAMPLE A                                METHOD                          EXAMPLE B

                        2A                         CHOOSE GENE(S) OF INTEREST                        5


                        3A                                                                       1 3 4
                             2B                                                              2
                                                      IDENTIFY HOMOLOGS                             5
                   1A 2A 1B 3B                                                                    6



                                                       ALIGN SEQUENCES

          1A      2A 3A 1B        2B      3B                                      1    2         3       4   5   6



                                                     CALCULATE GENE TREE


                                Duplication?


         1A       2A 3A 1B       2B      3B                                       1    2         3       4   5   6



                                                       OVERLAY KNOWN
                                                     FUNCTIONS ONTO TREE

                                Duplication?


                                                                                 1      2        3       4   5   6
         1A       2A 3A 1B       2B      3B



                                                     INFER LIKELY FUNCTION
                                                     OF GENE(S) OF INTEREST
                                                                                Ambiguous
                                Duplication?



      Species 1     Species 2          Species 3
       1A 1B         2A 2B              3A 3B                                     1    2         3       4   5   6


                                                       ACTUAL EVOLUTION
                                                   (ASSUMED TO BE UNKNOWN)
                                                                                                                     Based on Eisen,
                                                                                                                     1998 Genome
                                Duplication
                                                                                                                     Res 8: 163-167.
Example 2: Recent Changes
• Phylogenomic functional prediction         NJ



                                                                *     **
                                                                                       V.cholerae
                                                                                                VC
                                                                                        V.cholerae
                                                                                                VC
                                                                                                  0512
                                                                                                  A1034
                                                                                         V.cholerae
                                                                                                  VC
                                                                                         V.cholerae
                                                                                                  VC
                                                                                         V.cholerae
                                                                                                 VC
                                                                                                    A0974
                                                                                                   A0068
                                                                                            V.cholerae
                                                                                                    VC0825
                                                                                                   0282


  may not work well for very newly
                                                                                      V.cholerae
                                                                                               VCA0906
                                                                                              V.cholerae
                                                                                                       VC
                                                                                                        A0979
                                                                                      V.cholerae
                                                                                               VCA1056
                                                                                         V.cholerae
                                                                                                 VC1643
                                                                                          V.cholerae
                                                                                                   VC
                                                                                                    2161
                                                                                           V.cholerae
                                                                                                   VCA0923
                                                                      **       **        V.cholerae
                                                                                                 VC0514
                                                                                            V.cholerae
                                                                                                     VC1868
                                                                                           V.cholerae
                                                                                                   VCA0773
                                                                                         V.cholerae
                                                                                                 VC1313


  evolved functions
                                                                                           V.cholerae
                                                                                                   VC1859
                                                                                        V.cholerae
                                                                                                 VC
                                                                                                  1413
                                                                                      V.cholerae
                                                                                               VCA0268
                                                                                                V.cholerae
                                                                                                        VC
                                                                                                         A0658
                                                              **                           V.cholerae
                                                                                                   VC1405
                                                                                          V.cholerae
                                                                                                   VC
                                                                                                    1298
                                                            *                               V.cholerae
                                                                                     V.cholerae
                                                                                              VCA0864
                                                                                                     VC
                                                                                                      1248
                                                                                     V.cholerae
                                                                                              VCA0176
                                                                                        V.cholerae
                                                                                                VCA0220
                                                                   **                  V.cholerae
                                                                                                VC1289
                                                                                           V.cholerae
                                                                                                   VC1069
                                                                                                     A
                                                                      **                 V.cholerae
                                                                                                 VC2439


• Can use understanding of origin of
                                                                                            V.cholerae
                                                                                                    VC967
                                                                                                      1
                                                                                            V.cholerae
                                                                                                    VCA0031
                                                                                        V.cholerae
                                                                                                 VC
                                                                                                  1898
                                                                                            V.cholerae
                                                                                                    VCA0663
                                                                                     V.cholerae
                                                                                             VC0988
                                                                                               A
                                                                                     V.cholerae
                                                                                              VC0216
                                                                                     V.cholerae
                                                                                              VC0449
                                                              *                     V.cholerae
                                                                                             VCA0008
                                                                                     V.cholerae
                                                                                              VC1406
                                                                                              V.cholerae
                                                                                                       VC
                                                                                                        1535


  novelty to better interpret these cases?
                                                                                       V.cholerae
                                                                                                VC
                                                                                                 0840
                                                                                                  B.subtilis
                                                                                                        gi2633766
                                                                                              Synechocystis
                                                                                                        sp.
                                                                                                          gi1001299
                                                                                     Synechocystis
                                                                                                sp.gi1001300
                                                                 *                            Synechocystis
                                                                                                        sp.
                                                                                                          gi1652276
                                                            *                           Synechocystis
                                                                  *                    H.pylori sp.  gi1652103
                                                                                             gi2313716
                                                                                       H.pylori
                                                                                            99 gi4155097
                                                                                    **C.jejuni
                                                             **                    C.jejuniCj1190c
                                                                                         Cj1110c
                                                                                     A.fulgidus
                                                                                             gi2649560
                                                                                     A.fulgidus
                                                                                             gi2649548
                                                                                   ** B.subtilis
                                                                                               gi2634254


• Screen genomes for genes that have
                                                                                     B.subtilis
                                                                                            gi2632630
                                                                                     B.subtilis
                                                                                             gi2635607
                                                                                     B.subtilis
                                                                                            gi2635608
                                                                                      B.subtilis
                                                                           ** ** B.subtilis  gi2635609
                                                                         **                 gi2635610
                                                                                          B.subtilis
                                                                                   E.coli        gi2635882
                                                                                   E.coligi1788195
                                                                                        gi2367378
                                                                        * **       E.coligi1788194
                                                                                       E.coli A1092
                                                                                            gi1787690
                                                                                     V.cholerae
                                                                                              VC


  changed recently
                                                                                      V.cholerae
                                                                                               VC0098
                                                                                      E.coli
                                                                                           gi1789453
                                                                                         H.pylori
                                                                                               gi2313186
                                                                                         H.pylori
                                                                                              99 gi4154603
                                                                                             C.jejuni
                                                                                     ** C.jejuni   Cj0144
                                                                                                   Cj1564
                                                                                             C.jejuni
                                                                              **         C.jejuniCj0262c
                                                                                           ** Cj1506c
                                                                                          H.pylori
                                                                                                gi2313163
                                                                        *                 H.pylori
                                                                                               99 gi4154575
                                                                                       **H.pylori
                                                                                               gi2313179
                                                                           **            H.pylori
                                                                                              99 gi4154599

–   Pseudogenes and gene loss                                                         ** C.jejuni Cj0019c
                                                                                                  C.jejuni
                                                                                              C.jejuni Cj0951c
                                                                                                    Cj0246c
                                                                                             B.subtilis
                                                                                                    gi2633374
                                                                                              T.maritima
                                                                                                      TM0014
                                                                                                   V.cholerae
                                                                                                          VC
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                            1403
                                                                                                          A1088
                                                                                                  T.pallidum
                                                                                                         gi3322777
                                                                                                         T.pallidum
                                                                        **                        T.pallidum gi3322939
                                                                                                         gi3322938
                                                                      **                           B.burgdorferi
                                                                                                            gi2688522

–   Contingency Loci
                                                                                                      T.pallidum
                                                                                                             gi3322296
                                                                                                  B.burgdorferi
                                                             *                          T.maritima gi2688521
                                                                                                TM0429
                                                                                        T.maritima
                                                                                      **T.maritima
                                                                                                TM0918
                                                                                     ** TM1428
                                                                                    T.maritima  TM0023
                                                               *                       T.maritima
                                                                                               TM1143
                                                                                    T.maritima
                                                                                             TM1146
                                                                                       P.abyssi
                                                                                              PAB1308
                                                                                       P.horikoshii
                                                                                                gi3256846
                                                                                  ** P.horikoshii
                                                                                      P.abyssi
                                                                                             PAB1336

–   Acquisition (e.g., LGT)
                                                                       **                      gi3256896
                                                              **                   **P.abyssi
                                                                                            PAB2066
                                                       **                            P.horikoshii
                                                                                              gi3258290
                                                            *                   ** P.abyssi  PAB1026
                                                                                       P.horikoshii
                                                                                                gi3256884
                                                                                **               D.radiodurans
                                                                                                          DRA00354
                                                                                                D.radiodurans
                                                                                                          DRA0353
                                                                                          ** D.radiodurans
                                                  **                **                               VC DRA0352
                                                                                            V.cholerae 1394
                                                                                           P.abyssi
                                                                                                 PAB1189
                                                                                           P.horikoshii
                                                                                                    gi3258414


–   Unusual dS/dN ratios
                                                                                    ** B.burgdorferi
                                                                                                 gi2688621
                                                                                               M.tuberculosis
                                                                                                         gi1666149
                                                                                                 V.cholerae
                                                                                                         VC
                                                                                                          0622




–   Rapid evolutionary rates
–   Recent duplications
RIPPING
                     CATGTACAGCA
                     GTACATGTCGT


                                                                                                       Galagan et al. Genome
                     CATGTACAGCA
                     GTACATGTCGT                                                                       sequence reveals
                     CATGTACAGCA
                                                                                                       significant


                                                                                                            S
                     GTACATGTCGT


                                                                                                       underrepresentation of


                                                                                                          F
                     TATGTATAG
                     ATACATATC
                                                                                                       recently duplicated genes.
                     TATATATAG
                            A




                                                                  O
                     ATATATATC

                     TATGTATAGTA
                     ATACATATCAT




                                                                 O
                       CH3 CH3

                                CH3
                     TATATATAGCA




                                                               R
                     ATATATATCGT
                            CH3
                                                                                                       AU: Fig.
                                                                                                       12.30. leg-




                                                   P
FIGURE 12.30. RIPPING. “The repeat-induced point mutation (RIP) process in Neurospora crassa.          end from
Duplications that occur during the vegetative phase are detected by RIP during the sexual cycle        source; re-
after fertilization but before the DNA synthesis and nuclear fusion (karyogamy). Duplicated se-        place with
quences that are longer than ~400 bp (or ~1 kb for unlinked duplications as shown) and sharing         an original
greater than ~80% nucleotide identity are detected. Numerous C-G to T-A point mutations are in-        legend.
troduced into both copies (unmutated C-G pairs are shown in blue; mutations are shown in red
letters; only a small number of base pairs are shown for clarity). RIP-mutated sequences are fre-
quent targets for methylation, which results in transcriptional silencing in Neurospora. In contrast
to mammals and plants, methylation is not limited to symmetric sites.”
Tetrahymena thermophila
macronuclear genome project
Tetrahymena’s two nuclear genomes

                 Micronucleus (MIC)
                   Germline Genome
                     (Silent)
                   5 pairs of chromosomes

                 Macronucleus (MAC)
                   Somatic genome
                     (Expressed)
                   250-300 chromosomes
                     @ ~45 copies each
Macronuclear Differentiation
Tetrahymena Genome Processing




                            • Analogous to RIPPING and
                              heterochromatin silencing
                            • Targets new/foreign DNA not duplicated
                              DNA
                            • Does not limit diversification by
                              duplication


Eisen et al. 2006. PLoS Biology.
Phylogenomics of Novelty II

Sometimes, it is easier to steal, borrow, or
 coopt functions rather than evolve them
                  anew
rRNA Tree of Life
Bacteria




                                       Archaea




 Eukaryotes

    FIgure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
Perna et al. 2003
Network of Life
Bacteria




                                       Archaea




 Eukaryotes

    Figure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
articles




                                                                                                         Arabidopsis thaliana
                                                                    *
* Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper
..........................................................................................................................................................................................................................................................................
                                                                                                                              .                                                                                                                             .

The ¯owering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions.
Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the
125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication,
followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene
transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000
families, similar to the functional diversity of Drosophila and Caenorhabditis elegansÐ the other sequenced multicellular
eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets
of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst
complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes
in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify
genes for crop improvement.


The plant and animal kingdoms evolved independently from                                                                                biologists, but will also affect agricultural science, evolutionary
unicellular eukaryotes and represent highly contrasting life forms.                                                                     biology, bioinformatics, combinatorial chemistry, functional and
The genome sequences of C. elegans1 and Drosophila2 reveal that                                                                         comparative genomics, and molecular medicine.
metazoans share a great deal of genetic information required for
developmental and physiological processes, but these genome                                                                             Overview of sequencing strategy
sequences represent a limited survey of multicellular organisms.                                                                        We used large-insert bacterial arti®cial chromosome (BAC), phage
Flowering plants have unique organizational and physiological                                                                           (P1) and transformation-competent arti®cial chromosome (TAC)
properties in addition to ancestral features conserved between                                                                          libraries9±12 as the primary substrates for sequencing. Early stages of
plants and animals. The genome sequence of a plant provides a                                                                           genome sequencing used 79 cosmid clones. Physical maps of the
means for understanding the genetic basis of differences between                                                                        genome of accession Columbia were assembled by restriction
plants and other eukaryotes, and provides the foundation for                                                                            fragment `®ngerprint' analysis of BAC clones13, by hybridization14
detailed functional characterization of plant genes.                                                                                    or polymerase chain reaction (PCR)15 of sequence-tagged sites and
   Arabidopsis thaliana has many advantages for genome analysis,                                                                        by hybridization and Southern blotting16. The resulting maps were
including a short generation time, small size, large number of                                                                          integrated (http://nucleus/cshl.org/arabmaps/) with the genetic
offspring, and a relatively small nuclear genome. These advantages                                                                      map and provided a foundation for assembling sets of contigs
promoted the growth of a scienti®c community that has investi-                                                                          into sequence-ready tiling paths. End sequence (http://www.
gated the biological processes of Arabidopsis and has characterized                                                                     tigr.org/tdb/at/abe/bac_end_search.html) of 47,788 BAC clones
many genes3. To support these activities, an international collabora-                                                                   was used to extend contigs from BACS anchored by marker content
tion (the Arabidopsis Genome Initiative, AGI) began sequencing                                                                          and to integrate contigs.
the genome in 1996. The sequences of chromosomes 2 and 4 have                                                                              Ten contigs representing the chromosome arms and centromeric
been reported4,5, and the accompanying Letters describe the                                                                             heterochromatin were assembled from 1,569 BAC, TAC, cosmid and
sequences of chromosomes 1 (ref. 6), 3 (ref. 7) and 5 (ref. 8).                                                                         P1 clones (average insert size 100 kilobases (kb)). Twenty-two PCR
   Here we report analysis of the completed Arabidopsis genome                                                                          products were ampli®ed directly from genomic DNA and
Correlated gain/loss of genes

• Microbial genes are lost rapidly when not
  maintained by selection
• Genes can be acquired by lateral transfer
• Frequently gain and loss occurs for entire
  pathways/processes
• Thus might be able to use correlated
  presence/absence information to identify
  genes with similar functions
Non-Homology Predictions:
    Phylogenetic Profiling

• Step 1: Search all genes in
  organisms of interest against all
  other genomes

• Ask: Yes or No, is each gene
  found in each other species

• Cluster genes by distribution
  patterns (profiles)
Carboxydothermus hydrogenoformans


• Isolated from a Russian hotspring
• Thermophile (grows at 80°C)
• Anaerobic
• Grows very efficiently on CO
  (Carbon Monoxide)
• Produces hydrogen gas
• Low GC Gram positive
  (Firmicute)
• Genome Determined (Wu et al.
  2005 PLoS Genetics 1: e65. )
Homologs of Sporulation Genes




                         Wu et al. 2005
                         PLoS Genetics 1:
                         e65.
Carboxydothermus sporulates




       Wu et al. 2005 PLoS Genetics 1: e65.
Wu et al. 2005 PLoS Genetics 1: e65.
Stealing Organisms (Symbioses)
Mutualistic Genome Evolution

• Compare and contrast different types of
  mutualistic symbioses
• Diverse hosts, symbionts, biology, ages
• Organelles, chemosymbioses,
  photosynthetic symbioses, nutritional
  symbioses
• What are the rules & patterns?
Glassy Winged Sharpshooter
                 • Obligate xylem feeder
                 • Can transmit Pierce’s
                   Disease agent
                 • Potential bioterror agent
                 • Needs to get amino-
                   acids and other nutrients
                   from symbionts like
                   aphids
Sharpshooter Shotgun Sequencing




                              shotgun




   Collaboration with Nancy
                                 Wu et al. 2006 PLoS Biology 4: e188.
   Moran’s lab
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Higher Evolutionary Rates in
                   Endosymbionts




Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Variation in Evolution Rates


                                                                MutS         MutL
                                                                +            +
                                                                +            +
                                                                +            +
                                                                +            +
                                                                _            _
                                                                _            _

Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
Baumannia is a Vitamin and
Cofactor Producing Machine




                             Wu et al.
                             2006
                             PLoS
                             Biology 4:
                             e188.
No Amino-Acid Synthesis
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
The Uncultured Majority
Great Plate Count Anomaly




Culturing     Microscope

  Count         Count
Great Plate Count Anomaly




Culturing       Microscope

  Count     <<<< Count
Great Plate Count Anomaly


                             DNA




Culturing       Microscope

  Count     <<<< Count
rRNA PCR

The Hidden Majority            Richness estimates




             Hugenholtz 2002         Bohannan and Hughes 2003
rRNA data increasing exponentially too
Perna et al. 2003
Metagenomics


         shotgun




                   clone
How can we best use
         metagenomic data?
• Many possible uses including:
  – Improvements on rRNA based phylotyping and
    species diversity measurements
  – Adding functional information on top of
    phylogenetic/species diversity information
• Most/all possible uses either require or are
  improved with phylogenetic analysis
Example I: Phylotyping with
   rRNA and other genes
Functional Diversity of Proteorhodopsins?




                                 Venter et al., 2004
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Example II: Binning
Metagenomics Challenge
Binning challenge

A                       T
B                       U
C                       V
D                       W
E                       X
F                       Y
G                       Z
Binning challenge

A                                            T
B                                            U
C                                            V
D                                            W
E                                            X
F                                            Y
G   Best binning method: reference genomes   Z
Binning challenge

A                                            T
B                                            U
C                                            V
D                                            W
E                                            X
F                                            Y
G   Best binning method: reference genomes   Z
Binning challenge

A                                          T
B                                          U
C                                          V
D                                          W
E                                          X
F                                          Y
G   No reference genome? What do you do?   Z
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
No Amino-Acid Synthesis
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
???????
CFB Phyla
Sulcia makes amino acids




Baumannia makes vitamins and cofactors




                         Wu et al. 2006 PLoS Biology 4: e188.
Phylogenomics of Novelty III

  Knowing What We Don’t Know
Research Topics



                                        Variation in
Mechanisms of
                                       Mechanisms:
Origin of New
                                      Patterns, Causes
  Functions
                                        and Effects




                  Species Evolution
Research Topics



                                        Variation in
Mechanisms of
                                       Mechanisms:
Origin of New
                                      Patterns, Causes
  Functions
                                        and Effects




                  Species Evolution
As of 2002
As of 2002   Proteobacteria
             TM6
             OS-K                    • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
             WS3
             Gemmimonas
             Firmicutes
             Fusobacteria
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
Need for Tree Guidance Well Established

• Common approach within some eukaryotic
  groups

• Many small projects funded to fill in some
  bacterial or archaeal gaps

• Phylogenetic gaps in bacterial and archaeal
  projects commonly lamented in literature
Proteobacteria
• NSF-funded       TM6
                   OS-K
                                           • At least 40
  Tree of Life     Acidobacteria
                   Termite Group             phyla of
                   OP8
  Project          Nitrospira
                   Bacteroides               bacteria
                   Chlorobi
• A genome         Fibrobacteres
                   Marine GroupA           • Genome
                   WS3
  from each of     Gemmimonas                sequences are
                   Firmicutes
  eight phyla      Fusobacteria
                                             mostly from
                   Actinobacteria
                   OP9
                   Cyanobacteria
                   Synergistes
                                             three phyla
                   Deferribacteres
                   Chrysiogenetes
                   NKB19
                                           • Some other
                   Verrucomicrobia
                   Chlamydia
                   OP3
                                             phyla are only
                   Planctomycetes
                   Spriochaetes              sparsely
                   Coprothmermobacter
                   OP10
                   Thermomicrobia
                                             sampled
                   Chloroflexi
                   TM7
                   Deinococcus-Thermus
                                           • Solution I:
                   Dictyoglomus
Eisen, Ward,       Aquificae
                   Thermudesulfobacteria
                                             sequence more
Robb, Nelson, et   Thermotogae
                                             phyla
                   OP1
al                 OP11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Still highly
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              biased in terms
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                                              of the tree
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Archaea
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Eukaryotes
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Same trend in
                    OP10
                    Thermomicrobia
                    Chloroflexi
                                              Viruses
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Proteobacteria
• GEBA              TM6
                    OS-K                    • At least 40
                    Acidobacteria
• A genomic         Termite Group
                    OP8
                                              phyla of bacteria
  encyclopedia      Nitrospira
                    Bacteroides             • Genome
                    Chlorobi
  of bacteria       Fibrobacteres
                    Marine GroupA
                                              sequences are
  and archaea       WS3
                    Gemmimonas                mostly from
                    Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria           • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter
                    OP10
                                            • Solution: Really
                    Thermomicrobia
                    Chloroflexi                Fill in the Tree
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
                    Thermudesulfobacteria
Eisen & Ward, PIs   Thermotogae
                    OP1
                    OP11
http://www.jgi.doe.gov/programs/GEBA/pilot.html
GEBA Pilot Project: Components
• Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan
  Eisen, Eddy Rubin, Jim Bristow)
• Project management (David Bruce, Eileen Dalin, Lynne Goodwin)
• Culture collection and DNA prep (DSMZ, Hans-Peter Klenk)
• Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus,
  Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng)
• Annotation and data release (Nikos Kyrpides, Victor Markowitz, et
  al)
• Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor
  Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik
  D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N.
  Ivanova, Athanasios Lykidis, Adam Zemla)
• Adopt a microbe education project (Cheryl Kerfeld)
• Outreach (David Gilbert)
• $$$ (DOE, Eddy Rubin, Jim Bristow)
GEBA Pilot Project Overview
• Identify major branches in rRNA tree for
  which no genomes are available
• Identify those with a cultured representative in
  DSMZ
• DSMZ grew > 200 of these and prepped DNA
• Sequence and finish 100+ (covering breadth of
  bacterial/archaea diversity)
• Annotate, analyze, release data
• Assess benefits of tree guided sequencing
• 1st paper Wu et al in Nature Dec 2009
Network of Life
Bacteria




                                       Archaea




 Eukaryotes

    Figure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
GEBA Lesson 1:
          The rRNA Tree of Life is a Useful Tool
          for Identifying Phylogenetically Novel




From Wu et al. 2009 Nature 462, 1056-1060
GEBA Lesson 2:
           The rRNA Tree of Life is not perfect ...
               16s                                              WGT, 23S




Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
GEBA Lesson 3:
  Phylogeny driven genome selection (and
 phylogenetics) improves genome annotation
• Took 56 GEBA genomes and compared results vs. 56
  randomly sampled new genomes
• Better definition of protein family sequence “patterns”
• Greatly improves “comparative” and “evolutionary”
  based predictions
• Conversion of hypothetical into conserved hypotheticals
• Linking distantly related members of protein families
• Improved non-homology prediction
GEBA Lesson 4:
 Metadata Important
GEBA Phylogenomic Lesson 5

  Phylogeny-driven genome selection
  helps discover new genetic diversity
Phylogenetic Distribution Novelty:
                Bacterial Actin Related Protein
                                                                2"#3)&4&*&& !"#*)$*),+%
                                                                5"#$-.-6&0&1- !"#$%,$-%)(
                                                               7"#0(1.8-9& !"#$''+-+,',!
                                                               5"#:1,)*&$/0 !"#&$,%+)+-+                                   !"#$%
                                                                 !"#$%&'()*&& !"#$%&'(%()
                                                         ((      +"#,-.(/01 !"#*+,**'+(
                                                              ;"#01,&-*0 !"#%*+$--(
                                                             <"#$-.-3.1%&0 !"#%',&'-+)
                                                             ')     2"#$&*-.-1 !"#$'(-%%+&$
                                                                       ="#$.1001 !"#-*$+$(&(                                !&'(
                                                           $++          >"#0$1,/%1.&0 !"#&$**+),)-!
                                                    *$          $++ ;"#01,&-*0 !"#*+,$*'(
                                                                     '*        5"#:1,)*&$/0 !"#&$,%+%-%%
                                                                  $++         5"#$-.-6&0&1- !"#',&+$)*
                                                                                                                            !&')
                                                                              ?"#@-%1*)A10(-. !"#&%'%&*%*
                                                                     $++ B"#A1%%/0# "#%*,-&*'(
                                                                         )*     2"#*-)').@1*0 !"#*-&'''(+
                                                                                 5"#$-.-6&0&1- !"#',&&*&*                   !&'*
                                                                      $++       ?"#@-%1*)A10(-. !"#$)),)*%,
                                                                         $++ ;"#01,&-*0 !"#*+,$*),!
                                                                                  ;"#)$C.1$-/@ !"#&&),(*((-                 +!&'
                                                                                       5"#$-.-6&0&1- !"#$++-&%%!
                                                     ),                    ."#,1(-*0 !"#$'-+*$((&!                          !&',
                                                                 ((      !"#(C1%&1*1 !"#$-,(%'+-!
                                                                        (%                 5"#$-.-6&0&1- !"#$,+$(,&
                                                               $++                          5"#:1,)*&$/0 !"#&$,%+-,(,!      !&'-
                                             -)                                         ?"#4&0$)&4-/@ !"#''-+&%$-
                                                      )%                                  ?"#@-%1*)A10(-. !"#$)),),%)
                                                              ()                                   5"#$-.-6&0&1- !"#',&,$$%
                                                                           $++               ?"#C1*0-*&&!"#&$-*$ $(&$       !&'.
                                                                          $++     D"#01(&61 !"#$-&'*)%&+!
                                                                                   !"#(C1%&1*1!"#$-%$ $),)                  !&'/
                                                                            ?"#@-%1*)A1(-. !"#$((&+,*-
                                                     $++               <"#@/0$/%/0 !"#&&'&%'*(,                           !&'(0


                                             +/*!



   Haliangium ochraceum DSM 14365                   Patrik D’haeseleer, Adam Zemla, Victor Kunin

Wu et al. 2009 Nature 462, 1056-1060   See also Guljamow et al. 2007 Current Biology.
Network of Life
Bacteria




                                       Archaea




 Eukaryotes

    FIgure from Barton, Eisen et al.
       “Evolution”, CSHL Press.
  Based on tree from Pace NR, 2003.
Protein Family Rarefaction
              Curves
• Take data set of multiple complete genomes
• Identify all protein families using MCL
• Plot # of genomes vs. # of protein families
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Wu et al. 2009 Nature 462, 1056-1060
Synapomorphies exist




Wu et al. 2009 Nature 462, 1056-1060
Families/PD not uniform
    +,%-./&#(%)"*




                            !"#$%"&'(%)"*
!                                  !
Structural Novelty
• Of the 17000 protein families in the GEBA56, 1800
  are novel in sequence (Wu)


• Structural modeling suggests many are structurally
  novel too (D'haeseleer)


• 372 being crystallized by the PSI (Kerfeld)
GEBA Phylogenomic Lesson 6

  Improves analysis of genome data
     from uncultured organisms
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                                               without good


                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                                               Cannot be done




                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c                      ia
                                                                                                                               sampling of genomes




                                                                                    oc
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                                                                                                              Phylogenetic Binning




                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                        Ac                        e   s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                                               without good


                                                                                                               C
                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                                               Cannot be done




                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c                      ia
                                                                                                                               sampling of genomes




                                                                                    oc
                                                                                      cu
                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
Weighted % of Clones




                                                                                                                           0
                                                                                                                               0.1250
                                                                                                                                                0.2500
                                                                                                                                                               0.3750
                                                                                                                                                                        0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                                                  e
                                                                                                                                    improves
                                                                                        Ac                            s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                                    GEBA Project




                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                 Sargasso Phylotypes




                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                                                                    metagenomic analysis




                                                                                         s-
                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                              Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                    EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                    EFTu



                                                                                                                                    rRNA
                                                                                                                                    RecA
                                                                                                                                    RpoB
                                                                                                                                    HSP70
GEBA Cyano
Sequencing status (as of 01/14):
   Awaiting
Material           11
   Library                     12
   Production                  22
   Finishing                    5
   Grand
Total                 50


On-going/ Planed Activities:
   - Building Cyanobacterial Metadatabase (IMG-GOLD)
   - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10)
        --> Cheryl will host: Workshop training as prep for virtual Jamboree




                                                                      117
GEBA RNB
Plan:
Sequence multiple Root Nodule Bacteria (RNBs) across
   the planet. Pilot: 100 RNBs.
                                                                       Beta RNB
                                                                        Cupriavidis
                                                                        Burkholderia
Goal:
•   Understand BioGeographical effects on species                     Alpha RNB
                                                                        Azorhizobium
    evolution and understand host-specificity.                          Allorhizobium
                                                                        Bradyrhizobium
                                                                        Mesorhizobium
Rationale:                                                              Rhizobium
                                                                        Sinorhizobium
•   N2 fixation by legume pastures and crops provides 65% of             Devosia
                                                                        Ochrobactrum
    the N currently utilized in agricultural production.                Phyllobacterium
                                                                        Balneimonas-like
•   Contributes 25 to 90 million metric tones N pa.
•   Symbioses save $US 6-10 billion annually on N fertilizer.
•   Grain and animal production enhanced by fixed nitrogen
    supplied by the symbiosis.




                                                                118
                                                                Nikos Kyrpides
119
Proteobacteria
• NSF-funded        TM6
                    OS-K
                                            • At least 40
  Tree of Life      Acidobacteria
                    Termite Group             phyla of bacteria
                    OP8
  Project           Nitrospira
                                            • Genome
                    Bacteroides

• A genome          Chlorobi
                    Fibrobacteres             sequences are
                    Marine GroupA
  from each of      WS3
                    Gemmimonas                mostly from
  eight phyla       Firmicutes
                    Fusobacteria              three phyla
                    Actinobacteria
                    OP9
                    Cyanobacteria
                                            • Some other
                    Synergistes
                    Deferribacteres
                    Chrysiogenetes
                                              phyla are only
                    NKB19
                    Verrucomicrobia           sparsely
                    Chlamydia
                    OP3
                    Planctomycetes
                                              sampled
                    Spriochaetes
                    Coprothmermobacter      • Still not happy
                    OP10
                    Thermomicrobia
                    Chloroflexi
                    TM7
                    Deinococcus-Thermus
                    Dictyoglomus
                    Aquificae
Eisen & Ward, PIs   Thermudesulfobacteria
                    Thermotogae
                    OP1
                    OP11
Weighted % of Clones




                                                                                                                           0
                                                                                                                                   0.1250
                                                                                                                                                    0.2500
                                                                                                                                                                   0.3750
                                                                                                                                                                            0.5000
                                                                         Al
                                                                              ph
                                                                                ap
                                                                                        ro
                                                                                             te
                                                                              Be                  ob
                                                                                   ta                     ac
                                                                                     pr                        te
                                                                                            ot                     ria
                                                                     G                           eo
                                                                         am                           ba
                                                                               m                           ct
                                                                                    ap                           er
                                                                                        ro                            ia
                                                                     Ep                      te
                                                                          si                      ob
                                                                               lo                         ac
                                                                                   np                          te
                                                                                        ro                          ria
                                                                          D                  te
                                                                              el                  ob
                                                                                ta                        ac
                                                                                    pr                         te
                                                                                            ot                      ria
                                                                                                 eo
                                                                                        C             ba
                                                                                            ya             ct
                                                                                              no                 er
                                                                                                 b                    ia
                                                                                                          ac
                                                                                                             te
                                                                                                 Fi                ria
                                                                                                      rm
                                                                                                           ic
                                                                                                               ut
                                                                                                                  e
                                                                                                                               improves
                                                                                        Ac                            s
                                                                                            tin
                                                                                               ob
                                                                                                          ac
                                                                                                             te
                                                                                                      C            ria
                                                                                                          hl
                                                                                                             o  ro
                                                                                                                    bi
                                                                                                               C
                                                                                                                               GEBA Project




                                                                                                                   FB




                                          Major Phylogenetic Group
                                                                                                                                                                                     Sargasso Phylotypes




                                                                                                                               but only a little
                                                                                             C
                                                                                                  hl
                                                                                                     o     ro
                                                                                                                fle
                                                                                         Sp                           xi
                                                                                                 iro
                                                                                                      ch
                                                                                                           ae
                                                                                         Fu                      te
                                                                                                 so                   s
                                                                     D                                ba
                                                                         ei                                ct
                                                                           no                                    er
                                                                              c     oc               ia
                                                                                      cu
                                                                                         s-
                                                                                                                               metagenomic analysis,




                                                                                    Eu Th
                                                                                       ry erm
                                                                                          ar
                                                                                             ch us
                                                                                    C           ae
                                                                                      re           ot
                                                                                         na           a
                                                                                            rc
                                                                                               ha
                                                                                                  eo
                                                                                                     ta
                                                                                                                                                                                                  Shotgun Sequencing Allows Use of Other Markers




                                                                                                                                        EFG




Venter et al., Science 304: 66-74. 2004
                                                                                                                                        EFTu



                                                                                                                                        rRNA
                                                                                                                                        RecA
                                                                                                                                        RpoB
                                                                                                                                        HSP70
Phylogenomics Future 1

    Need to adapt genomic and
metagenomic methods to make better
            use of data
iSEEM Project
Ways to Make Better Use of
              GEBA Data
•   Better phylogenetic methods for short reads
•   Rebuild protein family models
•   New phylogenetic markers
•   Need better phylogenies, including HGT
•   Improved tools for using distantly related
    genomes in metagenomic analysis
PhylOTU: A High-Throughput Procedure Quantifies
Microbial Community Diversity and Resolves Novel Taxa
from Metagenomic Data
Thomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P.
O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,5
1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary
Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom,
4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics,
                                                                                                                Finding Metagenomic OTUs
University of California San Francisco, San Francisco, California, United States of America



     Abstract
     Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic
     units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in
     priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids
     amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-
     finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs
     from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles.
     Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons
     of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries
     identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In
     addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by
     analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the
     biosphere currently hidden from PCR-based surveys of diversity?

  Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community
  Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061
  Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel
                ´ja
  Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011
  Copyright: ß 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
Distances
between
gene
trees
and
the
AMPHORA
concatenated
genome
tree
   rpmA                                                                                        coaE
    coaE                                                                                      rpmA
   trmD                                                                                          rplL
    rpsS                                                                                       rpsQ
    radA                                                                                        rplR
     rplD                                                                                       rplQ
       tsf                                                                                     rpsH
       frr                                                                                    smpB
        ttf                                                                                    rpsO
     rplR                                                                                        rplP
     rplM                                                                                      rpsS
      rplI                                                                                      rplV
    rpsB                                                                                         rplT
    rpsO                                                                                        rplO
  mraW                                                                                          rpsP
    rpsH                                                                                       rpsK
     rplQ                                                                                       rplU
      rplL                                                                                         tsf
      rplT                                                                                    trmD
     rplE                                                                                       rplS
     rpsP                                                                                          ttf
     rplC                                                                                       rpsI
     rplV                                                                                     mraW
     rplS                                                                                       rpsL
     infC                                                                                      rpsG
    rpsM                                                                                        rplM
     rplO                                                                                         rplI
     rplU                                                                                      pyrH
     rpsL                                                                                      rpsM
    rpsQ                                                                                       ruvA
   guaA                                                                                        radA
    rpsG                                                                                       purA
   smpB                                                                                         rplK
     priA                                                                                       rplD
    rpsK                                                                                         infC
     rplK                                                                                       rplC
    serS                                                                                         rplE
     rplA                                                                                       rplA
      rplF                                                                                         frr
    ruvA                                                                                         rplF
    rpsC                                                                                       serS
     rplN                                                                                       rplN
      rplP                                                                                    guaA
    rpsE                                                                                       ruvB
    pyrH                                                                                       rpsB
     rpsI                                                                                       rpsJ
    secY                                                                                   rRNA16S
     rpsJ                                                                                      secY
    purA                                                                                        rplB
     rplB                                                                                       priA
    nusA                                                                                       rpsE
    ruvB                                                                                       rpsC
rRNA16S                                                                                        nusA
              0           1          2          3            4          5          6                     0   0.1    0.2     0.3   0.4       0.5    0.6   0.7   0.8   0.9

                           NODAL
distance                                                                                               SPLIT
distance

                  AMPHORA
marker         Ribosomal
protein       Transcrip1on/transla1on
related
protein     DNA
repair
protein     Protein
of
other
func1on

                  Distance
between
the
genome
tree
and
100
random
trees
(average
±
standard
devia1on)
Screen
gene
markers
for
any
given
taxonomic
group

Phylogene8c
group       Genome
Number   Gene
Number   Maker
Candidates


Archaea                 62              145415        106

Ac1nobacteria           63              267783        136

Alphaproteobacteria     94              347287        121

Betaproteobacteria      56              266362        311

Gammaproteobacteria     126             483632        118

Deltaproteobacteria     25              102115        206

Epislonproteobacteria   18              33416         455

Bacteriodes             25              71531         286

Chlamydae               13              13823         560

Chloroflexi              10              33577         323

Cyanobacteria           36              124080        590

Firmicutes              106             312309        87

Spirochaetes            18              38832         176

Thermi                  5               14160         974

Thermotogae             9               17037         684
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
Phylogenomics Future 2

We have still only scratched the
 surface of microbial diversity
rRNA Tree of Life




 FIgure from Barton, Eisen et al.
    “Evolution”, CSHL Press.
Based on tree from Pace NR, 2003.
Phylogenetic Diversity:
              Sequenced Bacteria & Archaea




From Wu
et al. 2009
Nature
462,
1056-1060
Phylogenetic Diversity with
                       GEBA




From Wu
et al. 2009
Nature
462,
1056-1060
Phylogenetic Diversity: Isolates




                 From Wu et al. 2009 Nature 462, 1056-1060
Phylogenetic Diversity: All




               From Wu et al. 2009 Nature 462, 1056-1060
Proteobacteria
TM6
OS-K
                        • At least 40 phyla of
Acidobacteria
Termite Group
OP8
                          bacteria
Nitrospira
Bacteroides
Chlorobi
                        • Genome sequences are
Fibrobacteres
Marine GroupA             mostly from three phyla
WS3
Gemmimonas
Firmicutes              • Most phyla with cultured
Fusobacteria
Actinobacteria            species are sparsely
OP9
Cyanobacteria
Synergistes
                          sampled
Deferribacteres
Chrysiogenetes
NKB19                   • Lineages with no cultured
Verrucomicrobia
Chlamydia
OP3
                          taxa even more poorly
Planctomycetes
Spriochaetes              sampled
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae                    Well sampled phyla
Thermudesulfobacteria
Thermotogae                 Poorly sampled
OP1
OP11
                             No cultured taxa
Uncultured Lineages:
         Technical Approaches
•   Get into culture
•   Enrichment cultures
•   If abundant in low diversity ecosystems
•   Flow sorting
•   Microbeads
•   Microfluidic sorting
•   Single cell amplification
GEBA uncultured
   Number of SAGs from Candidate Phyla




                                                                 406
                                                   1
                                            OD1

                                                  OP1

                                                        OP3

                                                              SAR
   Site   A: Hydrothermal vent               4      1    -      -
   Site   B: Gold Mine                       6     13    2      -
   Site   C: Tropical gyres (Mesopelagic)    -      -    -      2
   Site   D: Tropical gyres (Photic zone)    1      -    -      -




Sample collections at 4 additional sites are underway.




                                                                             Phil Hugenholtz



                                                                       139
Phylogenomics Future 3

Need Experiments from Across the
        Tree of Life too
As of 2002   Proteobacteria
             TM6
             OS-K                    • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA
             WS3
             Gemmimonas
             Firmicutes
             Fusobacteria
             Actinobacteria
             OP9
             Cyanobacteria
             Synergistes
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Experimental
             WS3
             Gemmimonas
             Firmicutes
                                       studies are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
             Verrucomicrobia
             Chlamydia
             OP3
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Experimental
             WS3
             Gemmimonas
             Firmicutes
                                       studies are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some studies
             Verrucomicrobia
             Chlamydia
             OP3
                                       in other phyla
             Planctomycetes
             Spriochaetes
             Coprothmermobacter
             OP10
             Thermomicrobia
             Chloroflexi
             TM7
             Deinococcus-Thermus
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
                                     • Same trend in
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
                                       Eukaryotes
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
As of 2002   Proteobacteria
             TM6
             OS-K
                                     • At least 40
             Acidobacteria
             Termite Group
             OP8
                                       phyla of
             Nitrospira
             Bacteroides               bacteria
             Chlorobi
             Fibrobacteres
             Marine GroupA           • Genome
             WS3
             Gemmimonas
             Firmicutes
                                       sequences are
             Fusobacteria
             Actinobacteria
                                       mostly from
             OP9
             Cyanobacteria
             Synergistes
                                       three phyla
             Deferribacteres
             Chrysiogenetes
             NKB19
                                     • Some other
             Verrucomicrobia
             Chlamydia
             OP3
                                       phyla are
             Planctomycetes
             Spriochaetes              only sparsely
             Coprothmermobacter
             OP10
             Thermomicrobia
                                       sampled
             Chloroflexi
             TM7
             Deinococcus-Thermus
                                     • Same trend in
             Dictyoglomus
             Aquificae
             Thermudesulfobacteria
                                       Viruses
             Thermotogae
             OP1                       Based on
             OP11                      Hugenholtz, 2002
Proteobacteria
TM6
OS-K
                          Need
Acidobacteria
Termite Group
OP8
                          experimental
Nitrospira
Bacteroides
Chlorobi
                          studies from
Fibrobacteres
Marine GroupA
WS3
                          across the tree
Gemmimonas
Firmicutes                too
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
                              0.1
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae                Tree based on
Thermudesulfobacteria
Thermotogae             Hugenholtz (2002)
OP1                     with some
OP11                    modifications.
Proteobacteria
TM6
OS-K
                          Adopt a
Acidobacteria
Termite Group
OP8
                          Microbe
Nitrospira
Bacteroides
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
                              0.1
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae                Tree based on
Thermudesulfobacteria
Thermotogae             Hugenholtz (2002)
OP1                     with some
OP11                    modifications.
Conclusion

• Phylogenetic sampling of genomes
  improves our understanding of microbial
  diversity in many ways
• Still need
  – More biogeography
  – More phenotypic/experimental data
  – Deeper phylogenetic sampling
Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11
MICROBES
A Happy Tree of Life
Acknowledgements

• GEBA: DOE-JGI, DSMZ
• GWSS: Nancy Moran & lab, Dongying Wu
• iSEEM: Katie Pollard, Jessica Green,
  Martin Wu
• RecA: Dongying Wu, Craig Venter, Doug
  Rusch, et al.

More Related Content

KEY
Phylogenomics and the diversity and diversification of microbes
PDF
Yeast ext
PPTX
Cancer Cells
PPTX
The Effect of Asparagus on Cancer Cell Growth
PPTX
The Effect of Asparagus on Cancer Cell Growth
PPTX
The Effect of Asparagus on Cancer Cell Growth
PPTX
Novel Neuroplasticity
PDF
Automated phylogenetic taxonomy in Fungi. DS. Hibbett
Phylogenomics and the diversity and diversification of microbes
Yeast ext
Cancer Cells
The Effect of Asparagus on Cancer Cell Growth
The Effect of Asparagus on Cancer Cell Growth
The Effect of Asparagus on Cancer Cell Growth
Novel Neuroplasticity
Automated phylogenetic taxonomy in Fungi. DS. Hibbett

Viewers also liked (19)

PDF
Future Everything
PDF
Mekesson Quarterly Reports 2008 2nd
PDF
Catalogo Soloptical Primavera 2014
PDF
NeoKeys MicroLaptop Brochure
PPT
Accessibility and Availability of CIAT Research Results: Progress Update
PPT
Implementing the New Strategic Directions: Medium Term Plan 2010-2012
PPTX
Ratziu hepatite delta du 2015
PDF
April 2009 Brains Plus Guts Equals Opportunity
PDF
mckesson Annual Report 2000
PDF
How Social Media can Help Promote You and Your Business
PDF
cardinal health Q2 2008 Earnings Presentation
PDF
UC Davis EVE161 Lecture 16 by @phylogenomics
PPT
Wbcn Perception Vs Reality
PDF
LECTRIC seminar Gamification - introductie Henny van Velzen
PDF
Lehman Brothers Global Healthcare Conference Presentation
PDF
morgan stanley November 2007 Morgan Stanley & Co. Incorporated
PPT
Grade 6 Astronomy
PPT
SFX in der MPG - Hintergründe und Erfahrungen
Future Everything
Mekesson Quarterly Reports 2008 2nd
Catalogo Soloptical Primavera 2014
NeoKeys MicroLaptop Brochure
Accessibility and Availability of CIAT Research Results: Progress Update
Implementing the New Strategic Directions: Medium Term Plan 2010-2012
Ratziu hepatite delta du 2015
April 2009 Brains Plus Guts Equals Opportunity
mckesson Annual Report 2000
How Social Media can Help Promote You and Your Business
cardinal health Q2 2008 Earnings Presentation
UC Davis EVE161 Lecture 16 by @phylogenomics
Wbcn Perception Vs Reality
LECTRIC seminar Gamification - introductie Henny van Velzen
Lehman Brothers Global Healthcare Conference Presentation
morgan stanley November 2007 Morgan Stanley & Co. Incorporated
Grade 6 Astronomy
SFX in der MPG - Hintergründe und Erfahrungen
Ad

Similar to Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11 (20)

PPT
04 ch22evolutionevidence2009
PDF
Nannoparticles Case Study
KEY
Phylogenomics, Microbes, Yada Yada Yada - Talk by Jeisen at JCVI 1/18/11
PPTX
transposons complete ppt
PDF
Human genetic variation and its contribution to complex traits
PDF
Current and Future Treatments for Azoospermia
PPTX
Drosophila Leon mutant:Study of Wing Development
PDF
Electrophoretic Patterns of Esterases in Eri silkworm Samia Cynthia ricini
KEY
Eisen Talk for MBL Microbial Diversity Course
PPTX
Pavlovian and Skinnerian Processes are Genetically Separable
PDF
IEA - O exemplo das abelhas sociais
PDF
Pielak_BiolBull2005
PPSX
Elegance of Science Art Contest 2009
DOCX
Manuscript
PDF
20140710 6 c_mason_ercc2.0_workshop
PDF
Pielak_DevBiol2004
PDF
Why phelogyny has to be this way not other way aroundExaplain bas.pdf
PDF
The wheat genome sequence: a foundation for accelerating improvment of bread ...
PPTX
Plant transposable elements where genetics meets genomics
PDF
Parodi et al 2008 veneno y neuronas
04 ch22evolutionevidence2009
Nannoparticles Case Study
Phylogenomics, Microbes, Yada Yada Yada - Talk by Jeisen at JCVI 1/18/11
transposons complete ppt
Human genetic variation and its contribution to complex traits
Current and Future Treatments for Azoospermia
Drosophila Leon mutant:Study of Wing Development
Electrophoretic Patterns of Esterases in Eri silkworm Samia Cynthia ricini
Eisen Talk for MBL Microbial Diversity Course
Pavlovian and Skinnerian Processes are Genetically Separable
IEA - O exemplo das abelhas sociais
Pielak_BiolBull2005
Elegance of Science Art Contest 2009
Manuscript
20140710 6 c_mason_ercc2.0_workshop
Pielak_DevBiol2004
Why phelogyny has to be this way not other way aroundExaplain bas.pdf
The wheat genome sequence: a foundation for accelerating improvment of bread ...
Plant transposable elements where genetics meets genomics
Parodi et al 2008 veneno y neuronas
Ad

More from Jonathan Eisen (20)

PDF
Eisen.CentralValley2024.pdf
PDF
Phylogenomics and the Diversity and Diversification of Microbes
PDF
Talk by Jonathan Eisen for LAMG2022 meeting
PDF
Thoughts on UC Davis' COVID Current Actions
PDF
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
PDF
A Field Guide to Sars-CoV-2
PDF
EVE198 Summer Session Class 4
PDF
EVE198 Summer Session 2 Class 1
PDF
EVE198 Summer Session 2 Class 2 Vaccines
PDF
EVE198 Spring2021 Class1 Introduction
PDF
EVE198 Spring2021 Class2
PDF
EVE198 Spring2021 Class5 Vaccines
PDF
EVE198 Winter2020 Class 8 - COVID RNA Detection
PDF
EVE198 Winter2020 Class 1 Introduction
PDF
EVE198 Winter2020 Class 3 - COVID Testing
PDF
EVE198 Winter2020 Class 5 - COVID Vaccines
PDF
EVE198 Winter2020 Class 9 - COVID Transmission
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
Eisen.CentralValley2024.pdf
Phylogenomics and the Diversity and Diversification of Microbes
Talk by Jonathan Eisen for LAMG2022 meeting
Thoughts on UC Davis' COVID Current Actions
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
A Field Guide to Sars-CoV-2
EVE198 Summer Session Class 4
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class2
EVE198 Spring2021 Class5 Vaccines
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction

Recently uploaded (20)

PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Uderstanding digital marketing and marketing stratergie for engaging the digi...
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
20th Century Theater, Methods, History.pptx
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
Hazard Identification & Risk Assessment .pdf
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
Virtual and Augmented Reality in Current Scenario
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
Trump Administration's workforce development strategy
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Uderstanding digital marketing and marketing stratergie for engaging the digi...
Chinmaya Tiranga quiz Grand Finale.pdf
20th Century Theater, Methods, History.pptx
A powerpoint presentation on the Revised K-10 Science Shaping Paper
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
What if we spent less time fighting change, and more time building what’s rig...
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
Environmental Education MCQ BD2EE - Share Source.pdf
Hazard Identification & Risk Assessment .pdf
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
B.Sc. DS Unit 2 Software Engineering.pptx
Virtual and Augmented Reality in Current Scenario
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
Trump Administration's workforce development strategy

Phylogenomics and the diversification of microbes: JA Eisen at UCSF 2/17/11

  • 1. Phylogenomics and the Diversity and Diversification of Microbes Jonathan A. Eisen UC Davis UCSF Talk February 17, 2011
  • 3. Phylogenomics of Novelty Mechanisms of Origin of New Functions
  • 4. Phylogenomics of Novelty Mechanisms of Variation in Origin of New Mechanisms: Functions Patterns, Causes and Effects
  • 5. Phylogenomics of Novelty Mechanisms of Variation in Origin of New Mechanisms: Functions Patterns, Causes and Effects Species Evolution
  • 6. Phylogenomics of Novelty Variation in Mechanisms of Mechanisms: Origin of New Patterns, Causes Functions and Effects Species Evolution
  • 7. Outline • Introduction • Phylogenomic Stories – Within genome invention of novelty – Stealing novelty – Communities of microbes – Community service and knowing what we don’t know
  • 9. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 10. Limited Sampling of RRR Studies FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 11. Limited Sampling of RRR Studies Haloferax Methanococcus Chlorobium Deinococcus Thermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 12. UV Survival E.coli vs H.volcanii 1 Ecoli vs. Hvolcanii 0.1 0.01 Relative 0.001 Survival 0.0001 1E-05 1E-06 1E-07 0 50 100 150 200 250 300 350 400 UV J/m2 E.coli NR10121 mfd- E.coli NR10125 mfd+ TIGR H.volcanii WFD11
  • 13. H. volcanii UV Repair Label 7 - 45J / m2) 0.6 Label5#2 0 J/m2 t0 45 J/m2 t0 45 J/m2 Photoreac. 45 J/m2 Dark 24 Hours 0.4 0.2 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Avg. Mol. Wt.(Base Pairs)
  • 15. TIGR Genome Projects Haloferax Methanococcus Chlorobium Deinococcus Thermotoga FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 22. Phylogenomics of Novelty I Origin of Functions from Within
  • 23. From Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 24. Blast Search of H. pylori “MutS” • Blast search pulls up Syn. sp MutS#2 with much higher p value than other MutS homologs • Based on this TIGR predicted this species had mismatch repair • Assumes functional constancy Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 25. Predicting Function • Identification of motifs – Short regions of sequence similarity that are indicative of general activity – e.g., ATP binding • Homology/similarity based methods – Gene sequence is searched against a databases of other sequences – If significant similar genes are found, their functional information is used • Problem – Genes frequently have similarity to hundreds of motifs and multiple genes, not all with the same function
  • 26. MutL?? Based on Eisen et al. 1997 Nature Medicine 3: 1076-1078.
  • 27. Overlaying Functions onto Tree MutS2 Aquae MSH5 Strpy Bacsu Synsp Deira Helpy Yeast Human Borbu Metth Celeg MSH6 mSaco Yeast Human Mouse Arath Yeast MSH4 Celeg Human Arath Human MSH3 Mouse Fly Spombe Yeast Xenla Rat Mouse Yeast Human MSH1 Spombe Yeast MSH2 Neucr Arath Aquae Trepa Chltr DeiraTheaq BacsuBorbu Thema SynspStrpy Based on Eisen, Ecoli Neigo 1998 Nucl Acids MutS1 Res 26: 4291-4300.
  • 29. Evolutionary Functional Prediction EXAMPLE A METHOD EXAMPLE B 2A CHOOSE GENE(S) OF INTEREST 5 3A 1 3 4 2B 2 IDENTIFY HOMOLOGS 5 1A 2A 1B 3B 6 ALIGN SEQUENCES 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 CALCULATE GENE TREE Duplication? 1A 2A 3A 1B 2B 3B 1 2 3 4 5 6 OVERLAY KNOWN FUNCTIONS ONTO TREE Duplication? 1 2 3 4 5 6 1A 2A 3A 1B 2B 3B INFER LIKELY FUNCTION OF GENE(S) OF INTEREST Ambiguous Duplication? Species 1 Species 2 Species 3 1A 1B 2A 2B 3A 3B 1 2 3 4 5 6 ACTUAL EVOLUTION (ASSUMED TO BE UNKNOWN) Based on Eisen, 1998 Genome Duplication Res 8: 163-167.
  • 30. Example 2: Recent Changes • Phylogenomic functional prediction NJ * ** V.cholerae VC V.cholerae VC 0512 A1034 V.cholerae VC V.cholerae VC V.cholerae VC A0974 A0068 V.cholerae VC0825 0282 may not work well for very newly V.cholerae VCA0906 V.cholerae VC A0979 V.cholerae VCA1056 V.cholerae VC1643 V.cholerae VC 2161 V.cholerae VCA0923 ** ** V.cholerae VC0514 V.cholerae VC1868 V.cholerae VCA0773 V.cholerae VC1313 evolved functions V.cholerae VC1859 V.cholerae VC 1413 V.cholerae VCA0268 V.cholerae VC A0658 ** V.cholerae VC1405 V.cholerae VC 1298 * V.cholerae V.cholerae VCA0864 VC 1248 V.cholerae VCA0176 V.cholerae VCA0220 ** V.cholerae VC1289 V.cholerae VC1069 A ** V.cholerae VC2439 • Can use understanding of origin of V.cholerae VC967 1 V.cholerae VCA0031 V.cholerae VC 1898 V.cholerae VCA0663 V.cholerae VC0988 A V.cholerae VC0216 V.cholerae VC0449 * V.cholerae VCA0008 V.cholerae VC1406 V.cholerae VC 1535 novelty to better interpret these cases? V.cholerae VC 0840 B.subtilis gi2633766 Synechocystis sp. gi1001299 Synechocystis sp.gi1001300 * Synechocystis sp. gi1652276 * Synechocystis * H.pylori sp. gi1652103 gi2313716 H.pylori 99 gi4155097 **C.jejuni ** C.jejuniCj1190c Cj1110c A.fulgidus gi2649560 A.fulgidus gi2649548 ** B.subtilis gi2634254 • Screen genomes for genes that have B.subtilis gi2632630 B.subtilis gi2635607 B.subtilis gi2635608 B.subtilis ** ** B.subtilis gi2635609 ** gi2635610 B.subtilis E.coli gi2635882 E.coligi1788195 gi2367378 * ** E.coligi1788194 E.coli A1092 gi1787690 V.cholerae VC changed recently V.cholerae VC0098 E.coli gi1789453 H.pylori gi2313186 H.pylori 99 gi4154603 C.jejuni ** C.jejuni Cj0144 Cj1564 C.jejuni ** C.jejuniCj0262c ** Cj1506c H.pylori gi2313163 * H.pylori 99 gi4154575 **H.pylori gi2313179 ** H.pylori 99 gi4154599 – Pseudogenes and gene loss ** C.jejuni Cj0019c C.jejuni C.jejuni Cj0951c Cj0246c B.subtilis gi2633374 T.maritima TM0014 V.cholerae VC V.cholerae VC 1403 A1088 T.pallidum gi3322777 T.pallidum ** T.pallidum gi3322939 gi3322938 ** B.burgdorferi gi2688522 – Contingency Loci T.pallidum gi3322296 B.burgdorferi * T.maritima gi2688521 TM0429 T.maritima **T.maritima TM0918 ** TM1428 T.maritima TM0023 * T.maritima TM1143 T.maritima TM1146 P.abyssi PAB1308 P.horikoshii gi3256846 ** P.horikoshii P.abyssi PAB1336 – Acquisition (e.g., LGT) ** gi3256896 ** **P.abyssi PAB2066 ** P.horikoshii gi3258290 * ** P.abyssi PAB1026 P.horikoshii gi3256884 ** D.radiodurans DRA00354 D.radiodurans DRA0353 ** D.radiodurans ** ** VC DRA0352 V.cholerae 1394 P.abyssi PAB1189 P.horikoshii gi3258414 – Unusual dS/dN ratios ** B.burgdorferi gi2688621 M.tuberculosis gi1666149 V.cholerae VC 0622 – Rapid evolutionary rates – Recent duplications
  • 31. RIPPING CATGTACAGCA GTACATGTCGT Galagan et al. Genome CATGTACAGCA GTACATGTCGT sequence reveals CATGTACAGCA significant S GTACATGTCGT underrepresentation of F TATGTATAG ATACATATC recently duplicated genes. TATATATAG A O ATATATATC TATGTATAGTA ATACATATCAT O CH3 CH3 CH3 TATATATAGCA R ATATATATCGT CH3 AU: Fig. 12.30. leg- P FIGURE 12.30. RIPPING. “The repeat-induced point mutation (RIP) process in Neurospora crassa. end from Duplications that occur during the vegetative phase are detected by RIP during the sexual cycle source; re- after fertilization but before the DNA synthesis and nuclear fusion (karyogamy). Duplicated se- place with quences that are longer than ~400 bp (or ~1 kb for unlinked duplications as shown) and sharing an original greater than ~80% nucleotide identity are detected. Numerous C-G to T-A point mutations are in- legend. troduced into both copies (unmutated C-G pairs are shown in blue; mutations are shown in red letters; only a small number of base pairs are shown for clarity). RIP-mutated sequences are fre- quent targets for methylation, which results in transcriptional silencing in Neurospora. In contrast to mammals and plants, methylation is not limited to symmetric sites.”
  • 33. Tetrahymena’s two nuclear genomes Micronucleus (MIC) Germline Genome (Silent) 5 pairs of chromosomes Macronucleus (MAC) Somatic genome (Expressed) 250-300 chromosomes @ ~45 copies each
  • 35. Tetrahymena Genome Processing • Analogous to RIPPING and heterochromatin silencing • Targets new/foreign DNA not duplicated DNA • Does not limit diversification by duplication Eisen et al. 2006. PLoS Biology.
  • 36. Phylogenomics of Novelty II Sometimes, it is easier to steal, borrow, or coopt functions rather than evolve them anew
  • 37. rRNA Tree of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 38. Perna et al. 2003
  • 39. Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 40. articles Arabidopsis thaliana * * Authorship of this paper should be cited as `The Arabidopsis Genome Iniative'. A full list of contributors appears at the end of this paper .......................................................................................................................................................................................................................................................................... . . The ¯owering plant Arabidopsis thaliana is an important model system for identifying genes and determining their functions. Here we report the analysis of the genomic sequence of Arabidopsis. The sequenced regions cover 115.4 megabases of the 125-megabase genome and extend into centromeric regions. The evolution of Arabidopsis involved a whole-genome duplication, followed by subsequent gene loss and extensive local gene duplications, giving rise to a dynamic genome enriched by lateral gene transfer from a cyanobacterial-like ancestor of the plastid. The genome contains 25,498 genes encoding proteins from 11,000 families, similar to the functional diversity of Drosophila and Caenorhabditis elegansÐ the other sequenced multicellular eukaryotes. Arabidopsis has many families of new proteins but also lacks several common protein families, indicating that the sets of common proteins have undergone differential expansion and contraction in the three multicellular eukaryotes. This is the ®rst complete genome sequence of a plant and provides the foundations for more comprehensive comparison of conserved processes in all eukaryotes, identifying a wide range of plant-speci®c gene functions and establishing rapid systematic ways to identify genes for crop improvement. The plant and animal kingdoms evolved independently from biologists, but will also affect agricultural science, evolutionary unicellular eukaryotes and represent highly contrasting life forms. biology, bioinformatics, combinatorial chemistry, functional and The genome sequences of C. elegans1 and Drosophila2 reveal that comparative genomics, and molecular medicine. metazoans share a great deal of genetic information required for developmental and physiological processes, but these genome Overview of sequencing strategy sequences represent a limited survey of multicellular organisms. We used large-insert bacterial arti®cial chromosome (BAC), phage Flowering plants have unique organizational and physiological (P1) and transformation-competent arti®cial chromosome (TAC) properties in addition to ancestral features conserved between libraries9±12 as the primary substrates for sequencing. Early stages of plants and animals. The genome sequence of a plant provides a genome sequencing used 79 cosmid clones. Physical maps of the means for understanding the genetic basis of differences between genome of accession Columbia were assembled by restriction plants and other eukaryotes, and provides the foundation for fragment `®ngerprint' analysis of BAC clones13, by hybridization14 detailed functional characterization of plant genes. or polymerase chain reaction (PCR)15 of sequence-tagged sites and Arabidopsis thaliana has many advantages for genome analysis, by hybridization and Southern blotting16. The resulting maps were including a short generation time, small size, large number of integrated (http://nucleus/cshl.org/arabmaps/) with the genetic offspring, and a relatively small nuclear genome. These advantages map and provided a foundation for assembling sets of contigs promoted the growth of a scienti®c community that has investi- into sequence-ready tiling paths. End sequence (http://www. gated the biological processes of Arabidopsis and has characterized tigr.org/tdb/at/abe/bac_end_search.html) of 47,788 BAC clones many genes3. To support these activities, an international collabora- was used to extend contigs from BACS anchored by marker content tion (the Arabidopsis Genome Initiative, AGI) began sequencing and to integrate contigs. the genome in 1996. The sequences of chromosomes 2 and 4 have Ten contigs representing the chromosome arms and centromeric been reported4,5, and the accompanying Letters describe the heterochromatin were assembled from 1,569 BAC, TAC, cosmid and sequences of chromosomes 1 (ref. 6), 3 (ref. 7) and 5 (ref. 8). P1 clones (average insert size 100 kilobases (kb)). Twenty-two PCR Here we report analysis of the completed Arabidopsis genome products were ampli®ed directly from genomic DNA and
  • 41. Correlated gain/loss of genes • Microbial genes are lost rapidly when not maintained by selection • Genes can be acquired by lateral transfer • Frequently gain and loss occurs for entire pathways/processes • Thus might be able to use correlated presence/absence information to identify genes with similar functions
  • 42. Non-Homology Predictions: Phylogenetic Profiling • Step 1: Search all genes in organisms of interest against all other genomes • Ask: Yes or No, is each gene found in each other species • Cluster genes by distribution patterns (profiles)
  • 43. Carboxydothermus hydrogenoformans • Isolated from a Russian hotspring • Thermophile (grows at 80°C) • Anaerobic • Grows very efficiently on CO (Carbon Monoxide) • Produces hydrogen gas • Low GC Gram positive (Firmicute) • Genome Determined (Wu et al. 2005 PLoS Genetics 1: e65. )
  • 44. Homologs of Sporulation Genes Wu et al. 2005 PLoS Genetics 1: e65.
  • 45. Carboxydothermus sporulates Wu et al. 2005 PLoS Genetics 1: e65.
  • 46. Wu et al. 2005 PLoS Genetics 1: e65.
  • 48. Mutualistic Genome Evolution • Compare and contrast different types of mutualistic symbioses • Diverse hosts, symbionts, biology, ages • Organelles, chemosymbioses, photosynthetic symbioses, nutritional symbioses • What are the rules & patterns?
  • 49. Glassy Winged Sharpshooter • Obligate xylem feeder • Can transmit Pierce’s Disease agent • Potential bioterror agent • Needs to get amino- acids and other nutrients from symbionts like aphids
  • 50. Sharpshooter Shotgun Sequencing shotgun Collaboration with Nancy Wu et al. 2006 PLoS Biology 4: e188. Moran’s lab
  • 54. Higher Evolutionary Rates in Endosymbionts Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  • 55. Variation in Evolution Rates MutS MutL + + + + + + + + _ _ _ _ Wu et al. 2006 PLoS Biology 4: e188. Collaboration with Nancy Moran’ s Lab
  • 56. Baumannia is a Vitamin and Cofactor Producing Machine Wu et al. 2006 PLoS Biology 4: e188.
  • 60. Great Plate Count Anomaly Culturing Microscope Count Count
  • 61. Great Plate Count Anomaly Culturing Microscope Count <<<< Count
  • 62. Great Plate Count Anomaly DNA Culturing Microscope Count <<<< Count
  • 63. rRNA PCR The Hidden Majority Richness estimates Hugenholtz 2002 Bohannan and Hughes 2003
  • 64. rRNA data increasing exponentially too
  • 65. Perna et al. 2003
  • 66. Metagenomics shotgun clone
  • 67. How can we best use metagenomic data? • Many possible uses including: – Improvements on rRNA based phylotyping and species diversity measurements – Adding functional information on top of phylogenetic/species diversity information • Most/all possible uses either require or are improved with phylogenetic analysis
  • 68. Example I: Phylotyping with rRNA and other genes
  • 69. Functional Diversity of Proteorhodopsins? Venter et al., 2004
  • 70. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 73. Binning challenge A T B U C V D W E X F Y G Z
  • 74. Binning challenge A T B U C V D W E X F Y G Best binning method: reference genomes Z
  • 75. Binning challenge A T B U C V D W E X F Y G Best binning method: reference genomes Z
  • 76. Binning challenge A T B U C V D W E X F Y G No reference genome? What do you do? Z
  • 82. Sulcia makes amino acids Baumannia makes vitamins and cofactors Wu et al. 2006 PLoS Biology 4: e188.
  • 83. Phylogenomics of Novelty III Knowing What We Don’t Know
  • 84. Research Topics Variation in Mechanisms of Mechanisms: Origin of New Patterns, Causes Functions and Effects Species Evolution
  • 85. Research Topics Variation in Mechanisms of Mechanisms: Origin of New Patterns, Causes Functions and Effects Species Evolution
  • 87. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 88. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 89. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 90. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 91. Need for Tree Guidance Well Established • Common approach within some eukaryotic groups • Many small projects funded to fill in some bacterial or archaeal gaps • Phylogenetic gaps in bacterial and archaeal projects commonly lamented in literature
  • 92. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of OP8 Project Nitrospira Bacteroides bacteria Chlorobi • A genome Fibrobacteres Marine GroupA • Genome WS3 from each of Gemmimonas sequences are Firmicutes eight phyla Fusobacteria mostly from Actinobacteria OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are only Planctomycetes Spriochaetes sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Solution I: Dictyoglomus Eisen, Ward, Aquificae Thermudesulfobacteria sequence more Robb, Nelson, et Thermotogae phyla OP1 al OP11
  • 94. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still highly OP10 Thermomicrobia Chloroflexi biased in terms TM7 Deinococcus-Thermus Dictyoglomus of the tree Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 95. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Archaea TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 96. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Eukaryotes TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 97. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Same trend in OP10 Thermomicrobia Chloroflexi Viruses TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 98. Proteobacteria • GEBA TM6 OS-K • At least 40 Acidobacteria • A genomic Termite Group OP8 phyla of bacteria encyclopedia Nitrospira Bacteroides • Genome Chlorobi of bacteria Fibrobacteres Marine GroupA sequences are and archaea WS3 Gemmimonas mostly from Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter OP10 • Solution: Really Thermomicrobia Chloroflexi Fill in the Tree TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Eisen & Ward, PIs Thermotogae OP1 OP11
  • 100. GEBA Pilot Project: Components • Project overview (Phil Hugenholtz, Nikos Kyrpides, Jonathan Eisen, Eddy Rubin, Jim Bristow) • Project management (David Bruce, Eileen Dalin, Lynne Goodwin) • Culture collection and DNA prep (DSMZ, Hans-Peter Klenk) • Sequencing and closure (Eileen Dalin, Susan Lucas, Alla Lapidus, Mat Nolan, Alex Copeland, Cliff Han, Feng Chen, Jan-Fang Cheng) • Annotation and data release (Nikos Kyrpides, Victor Markowitz, et al) • Analysis (Dongying Wu, Kostas Mavrommatis, Martin Wu, Victor Kunin, Neil Rawlings, Ian Paulsen, Patrick Chain, Patrik D’Haeseleer, Sean Hooper, Iain Anderson, Amrita Pati, Natalia N. Ivanova, Athanasios Lykidis, Adam Zemla) • Adopt a microbe education project (Cheryl Kerfeld) • Outreach (David Gilbert) • $$$ (DOE, Eddy Rubin, Jim Bristow)
  • 101. GEBA Pilot Project Overview • Identify major branches in rRNA tree for which no genomes are available • Identify those with a cultured representative in DSMZ • DSMZ grew > 200 of these and prepped DNA • Sequence and finish 100+ (covering breadth of bacterial/archaea diversity) • Annotate, analyze, release data • Assess benefits of tree guided sequencing • 1st paper Wu et al in Nature Dec 2009
  • 102. Network of Life Bacteria Archaea Eukaryotes Figure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 103. GEBA Lesson 1: The rRNA Tree of Life is a Useful Tool for Identifying Phylogenetically Novel From Wu et al. 2009 Nature 462, 1056-1060
  • 104. GEBA Lesson 2: The rRNA Tree of Life is not perfect ... 16s WGT, 23S Badger et al. 2005 Int J System Evol Microbiol 55: 1021-1026.
  • 105. GEBA Lesson 3: Phylogeny driven genome selection (and phylogenetics) improves genome annotation • Took 56 GEBA genomes and compared results vs. 56 randomly sampled new genomes • Better definition of protein family sequence “patterns” • Greatly improves “comparative” and “evolutionary” based predictions • Conversion of hypothetical into conserved hypotheticals • Linking distantly related members of protein families • Improved non-homology prediction
  • 106. GEBA Lesson 4: Metadata Important
  • 107. GEBA Phylogenomic Lesson 5 Phylogeny-driven genome selection helps discover new genetic diversity
  • 108. Phylogenetic Distribution Novelty: Bacterial Actin Related Protein 2"#3)&4&*&& !"#*)$*),+% 5"#$-.-6&0&1- !"#$%,$-%)( 7"#0(1.8-9& !"#$''+-+,',! 5"#:1,)*&$/0 !"#&$,%+)+-+ !"#$% !"#$%&'()*&& !"#$%&'(%() (( +"#,-.(/01 !"#*+,**'+( ;"#01,&-*0 !"#%*+$--( <"#$-.-3.1%&0 !"#%',&'-+) ') 2"#$&*-.-1 !"#$'(-%%+&$ ="#$.1001 !"#-*$+$(&( !&'( $++ >"#0$1,/%1.&0 !"#&$**+),)-! *$ $++ ;"#01,&-*0 !"#*+,$*'( '* 5"#:1,)*&$/0 !"#&$,%+%-%% $++ 5"#$-.-6&0&1- !"#',&+$)* !&') ?"#@-%1*)A10(-. !"#&%'%&*%* $++ B"#A1%%/0# "#%*,-&*'( )* 2"#*-)').@1*0 !"#*-&'''(+ 5"#$-.-6&0&1- !"#',&&*&* !&'* $++ ?"#@-%1*)A10(-. !"#$)),)*%, $++ ;"#01,&-*0 !"#*+,$*),! ;"#)$C.1$-/@ !"#&&),(*((- +!&' 5"#$-.-6&0&1- !"#$++-&%%! ), ."#,1(-*0 !"#$'-+*$((&! !&', (( !"#(C1%&1*1 !"#$-,(%'+-! (% 5"#$-.-6&0&1- !"#$,+$(,& $++ 5"#:1,)*&$/0 !"#&$,%+-,(,! !&'- -) ?"#4&0$)&4-/@ !"#''-+&%$- )% ?"#@-%1*)A10(-. !"#$)),),%) () 5"#$-.-6&0&1- !"#',&,$$% $++ ?"#C1*0-*&&!"#&$-*$ $(&$ !&'. $++ D"#01(&61 !"#$-&'*)%&+! !"#(C1%&1*1!"#$-%$ $),) !&'/ ?"#@-%1*)A1(-. !"#$((&+,*- $++ <"#@/0$/%/0 !"#&&'&%'*(, !&'(0 +/*! Haliangium ochraceum DSM 14365 Patrik D’haeseleer, Adam Zemla, Victor Kunin Wu et al. 2009 Nature 462, 1056-1060 See also Guljamow et al. 2007 Current Biology.
  • 109. Network of Life Bacteria Archaea Eukaryotes FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 110. Protein Family Rarefaction Curves • Take data set of multiple complete genomes • Identify all protein families using MCL • Plot # of genomes vs. # of protein families
  • 111. Wu et al. 2009 Nature 462, 1056-1060
  • 112. Wu et al. 2009 Nature 462, 1056-1060
  • 113. Wu et al. 2009 Nature 462, 1056-1060
  • 114. Wu et al. 2009 Nature 462, 1056-1060
  • 115. Wu et al. 2009 Nature 462, 1056-1060
  • 116. Synapomorphies exist Wu et al. 2009 Nature 462, 1056-1060
  • 117. Families/PD not uniform +,%-./&#(%)"* !"#$%"&'(%)"* ! !
  • 118. Structural Novelty • Of the 17000 protein families in the GEBA56, 1800 are novel in sequence (Wu) • Structural modeling suggests many are structurally novel too (D'haeseleer) • 372 being crystallized by the PSI (Kerfeld)
  • 119. GEBA Phylogenomic Lesson 6 Improves analysis of genome data from uncultured organisms
  • 120. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 121. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 122. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi C FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch Phylogenetic Binning ae Fu te so s D ba ei ct no er c oc ia cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 123. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut Ac e s tin ob ac te C ria hl o ro bi without good C FB Major Phylogenetic Group Sargasso Phylotypes C Cannot be done hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c ia sampling of genomes oc cu s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 124. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu metagenomic analysis s- Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 125. GEBA Cyano Sequencing status (as of 01/14): Awaiting
Material 11 Library 12 Production 22 Finishing 5 Grand
Total 50 On-going/ Planed Activities: - Building Cyanobacterial Metadatabase (IMG-GOLD) - 10th Cyanobacterial Molecular Biology Workshop, Lake Arrowhead, CA (06/10) --> Cheryl will host: Workshop training as prep for virtual Jamboree 117
  • 126. GEBA RNB Plan: Sequence multiple Root Nodule Bacteria (RNBs) across the planet. Pilot: 100 RNBs. Beta RNB Cupriavidis Burkholderia Goal: • Understand BioGeographical effects on species Alpha RNB Azorhizobium evolution and understand host-specificity. Allorhizobium Bradyrhizobium Mesorhizobium Rationale: Rhizobium Sinorhizobium • N2 fixation by legume pastures and crops provides 65% of Devosia Ochrobactrum the N currently utilized in agricultural production. Phyllobacterium Balneimonas-like • Contributes 25 to 90 million metric tones N pa. • Symbioses save $US 6-10 billion annually on N fertilizer. • Grain and animal production enhanced by fixed nitrogen supplied by the symbiosis. 118 Nikos Kyrpides
  • 127. 119
  • 128. Proteobacteria • NSF-funded TM6 OS-K • At least 40 Tree of Life Acidobacteria Termite Group phyla of bacteria OP8 Project Nitrospira • Genome Bacteroides • A genome Chlorobi Fibrobacteres sequences are Marine GroupA from each of WS3 Gemmimonas mostly from eight phyla Firmicutes Fusobacteria three phyla Actinobacteria OP9 Cyanobacteria • Some other Synergistes Deferribacteres Chrysiogenetes phyla are only NKB19 Verrucomicrobia sparsely Chlamydia OP3 Planctomycetes sampled Spriochaetes Coprothmermobacter • Still not happy OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Eisen & Ward, PIs Thermudesulfobacteria Thermotogae OP1 OP11
  • 129. Weighted % of Clones 0 0.1250 0.2500 0.3750 0.5000 Al ph ap ro te Be ob ta ac pr te ot ria G eo am ba m ct ap er ro ia Ep te si ob lo ac np te ro ria D te el ob ta ac pr te ot ria eo C ba ya ct no er b ia ac te Fi ria rm ic ut e improves Ac s tin ob ac te C ria hl o ro bi C GEBA Project FB Major Phylogenetic Group Sargasso Phylotypes but only a little C hl o ro fle Sp xi iro ch ae Fu te so s D ba ei ct no er c oc ia cu s- metagenomic analysis, Eu Th ry erm ar ch us C ae re ot na a rc ha eo ta Shotgun Sequencing Allows Use of Other Markers EFG Venter et al., Science 304: 66-74. 2004 EFTu rRNA RecA RpoB HSP70
  • 130. Phylogenomics Future 1 Need to adapt genomic and metagenomic methods to make better use of data
  • 132. Ways to Make Better Use of GEBA Data • Better phylogenetic methods for short reads • Rebuild protein family models • New phylogenetic markers • Need better phylogenies, including HGT • Improved tools for using distantly related genomes in metagenomic analysis
  • 133. PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data Thomas J. Sharpton1*, Samantha J. Riesenfeld1, Steven W. Kembel2, Joshua Ladau1, James P. O’Dwyer2,3, Jessica L. Green2, Jonathan A. Eisen4, Katherine S. Pollard1,5 1 The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America, 2 Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America, 3 Institute of Integrative and Comparative Biology, University of Leeds, Leeds, United Kingdom, 4 Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America, 5 Institute for Human Genetics & Division of Biostatistics, Finding Metagenomic OTUs University of California San Francisco, San Francisco, California, United States of America Abstract Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU- finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity? Citation: Sharpton TJ, Riesenfeld SJ, Kembel SW, Ladau J, O’Dwyer JP, et al. (2011) PhylOTU: A High-Throughput Procedure Quantifies Microbial Community Diversity and Resolves Novel Taxa from Metagenomic Data. PLoS Comput Biol 7(1): e1001061. doi:10.1371/journal.pcbi.1001061 Editor: Oded Be ` , Technion-Israel Institute of Technology, Israel ´ja Received July 22, 2010; Accepted December 17, 2010; Published January 20, 2011 Copyright: ß 2011 Sharpton et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
  • 134. Distances
between
gene
trees
and
the
AMPHORA
concatenated
genome
tree rpmA coaE coaE rpmA trmD rplL rpsS rpsQ radA rplR rplD rplQ tsf rpsH frr smpB ttf rpsO rplR rplP rplM rpsS rplI rplV rpsB rplT rpsO rplO mraW rpsP rpsH rpsK rplQ rplU rplL tsf rplT trmD rplE rplS rpsP ttf rplC rpsI rplV mraW rplS rpsL infC rpsG rpsM rplM rplO rplI rplU pyrH rpsL rpsM rpsQ ruvA guaA radA rpsG purA smpB rplK priA rplD rpsK infC rplK rplC serS rplE rplA rplA rplF frr ruvA rplF rpsC serS rplN rplN rplP guaA rpsE ruvB pyrH rpsB rpsI rpsJ secY rRNA16S rpsJ secY purA rplB rplB priA nusA rpsE ruvB rpsC rRNA16S nusA 0 1 2 3 4 5 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 NODAL
distance SPLIT
distance AMPHORA
marker Ribosomal
protein Transcrip1on/transla1on
related
protein DNA
repair
protein Protein
of
other
func1on Distance
between
the
genome
tree
and
100
random
trees
(average
±
standard
devia1on)
  • 135. Screen
gene
markers
for
any
given
taxonomic
group Phylogene8c
group Genome
Number Gene
Number Maker
Candidates Archaea 62 145415 106 Ac1nobacteria 63 267783 136 Alphaproteobacteria 94 347287 121 Betaproteobacteria 56 266362 311 Gammaproteobacteria 126 483632 118 Deltaproteobacteria 25 102115 206 Epislonproteobacteria 18 33416 455 Bacteriodes 25 71531 286 Chlamydae 13 13823 560 Chloroflexi 10 33577 323 Cyanobacteria 36 124080 590 Firmicutes 106 312309 87 Spirochaetes 18 38832 176 Thermi 5 14160 974 Thermotogae 9 17037 684
  • 139. Phylogenomics Future 2 We have still only scratched the surface of microbial diversity
  • 140. rRNA Tree of Life FIgure from Barton, Eisen et al. “Evolution”, CSHL Press. Based on tree from Pace NR, 2003.
  • 141. Phylogenetic Diversity: Sequenced Bacteria & Archaea From Wu et al. 2009 Nature 462, 1056-1060
  • 142. Phylogenetic Diversity with GEBA From Wu et al. 2009 Nature 462, 1056-1060
  • 143. Phylogenetic Diversity: Isolates From Wu et al. 2009 Nature 462, 1056-1060
  • 144. Phylogenetic Diversity: All From Wu et al. 2009 Nature 462, 1056-1060
  • 145. Proteobacteria TM6 OS-K • At least 40 phyla of Acidobacteria Termite Group OP8 bacteria Nitrospira Bacteroides Chlorobi • Genome sequences are Fibrobacteres Marine GroupA mostly from three phyla WS3 Gemmimonas Firmicutes • Most phyla with cultured Fusobacteria Actinobacteria species are sparsely OP9 Cyanobacteria Synergistes sampled Deferribacteres Chrysiogenetes NKB19 • Lineages with no cultured Verrucomicrobia Chlamydia OP3 taxa even more poorly Planctomycetes Spriochaetes sampled Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Well sampled phyla Thermudesulfobacteria Thermotogae Poorly sampled OP1 OP11 No cultured taxa
  • 146. Uncultured Lineages: Technical Approaches • Get into culture • Enrichment cultures • If abundant in low diversity ecosystems • Flow sorting • Microbeads • Microfluidic sorting • Single cell amplification
  • 147. GEBA uncultured Number of SAGs from Candidate Phyla 406 1 OD1 OP1 OP3 SAR Site A: Hydrothermal vent 4 1 - - Site B: Gold Mine 6 13 2 - Site C: Tropical gyres (Mesopelagic) - - - 2 Site D: Tropical gyres (Photic zone) 1 - - - Sample collections at 4 additional sites are underway. Phil Hugenholtz 139
  • 148. Phylogenomics Future 3 Need Experiments from Across the Tree of Life too
  • 149. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 150. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 151. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Experimental WS3 Gemmimonas Firmicutes studies are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some studies Verrucomicrobia Chlamydia OP3 in other phyla Planctomycetes Spriochaetes Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Thermudesulfobacteria Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 152. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Eukaryotes Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 153. As of 2002 Proteobacteria TM6 OS-K • At least 40 Acidobacteria Termite Group OP8 phyla of Nitrospira Bacteroides bacteria Chlorobi Fibrobacteres Marine GroupA • Genome WS3 Gemmimonas Firmicutes sequences are Fusobacteria Actinobacteria mostly from OP9 Cyanobacteria Synergistes three phyla Deferribacteres Chrysiogenetes NKB19 • Some other Verrucomicrobia Chlamydia OP3 phyla are Planctomycetes Spriochaetes only sparsely Coprothmermobacter OP10 Thermomicrobia sampled Chloroflexi TM7 Deinococcus-Thermus • Same trend in Dictyoglomus Aquificae Thermudesulfobacteria Viruses Thermotogae OP1 Based on OP11 Hugenholtz, 2002
  • 154. Proteobacteria TM6 OS-K Need Acidobacteria Termite Group OP8 experimental Nitrospira Bacteroides Chlorobi studies from Fibrobacteres Marine GroupA WS3 across the tree Gemmimonas Firmicutes too Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications.
  • 155. Proteobacteria TM6 OS-K Adopt a Acidobacteria Termite Group OP8 Microbe Nitrospira Bacteroides Chlorobi Fibrobacteres Marine GroupA WS3 Gemmimonas Firmicutes Fusobacteria Actinobacteria OP9 Cyanobacteria Synergistes Deferribacteres Chrysiogenetes NKB19 Verrucomicrobia Chlamydia OP3 Planctomycetes Spriochaetes 0.1 Coprothmermobacter OP10 Thermomicrobia Chloroflexi TM7 Deinococcus-Thermus Dictyoglomus Aquificae Tree based on Thermudesulfobacteria Thermotogae Hugenholtz (2002) OP1 with some OP11 modifications.
  • 156. Conclusion • Phylogenetic sampling of genomes improves our understanding of microbial diversity in many ways • Still need – More biogeography – More phenotypic/experimental data – Deeper phylogenetic sampling
  • 159. A Happy Tree of Life
  • 160. Acknowledgements • GEBA: DOE-JGI, DSMZ • GWSS: Nancy Moran & lab, Dongying Wu • iSEEM: Katie Pollard, Jessica Green, Martin Wu • RecA: Dongying Wu, Craig Venter, Doug Rusch, et al.

Editor's Notes