Class 2:
MIB200
Biology of Organisms without Nuclei
Class #2:
Phylogeny
UC Davis, Fall 2019
Instructor: Jonathan Eisen
1
Hugenholtz et al. 1998
Woese 1987
Some Questions
• What is a phylogenetic tree?
• What can be shown in a phylogenetic tree?
• How does one infer a phylogenetic tree?
• How does one know if a tree is correct?
• How can one use phylogenetic trees?
• What is the difference between a gene tree and a species tree?
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
Raff J. How to Read and Understand a Scientific Article
1. Begin by reading the introduction, not the abstract.
https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf
2. Identify the big question.
3. Summarize the background in five sentences or less.
4. Identify the specific question(s).
5. Identify the approach.
6. Read the methods section.
7. Read the results section.
8. Determine whether the results answer the specific
question(s).
9. Read the conclusion/discussion/interpretation section.
10. Go back to the beginning and read the abstract.
11. Find out what other researchers say about the paper.
Raff J. How to Read and Understand a Scientific Article
1. Begin by reading the introduction, not the abstract.
https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf
2. Identify the big question.
3. Summarize the background in five sentences or less.
4. Identify the specific question(s).
5. Identify the approach.
6. Read the methods section.
7. Read the results section.
8. Determine whether the results answer the specific
question(s).
9. Read the conclusion/discussion/interpretation section.
10. Go back to the beginning and read the abstract.
11. Find out what other researchers say about the paper.
X
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
A phylogenetic tree is composed of branches (edges) and nodes.
Branches connect nodes; a node is the point at which two (or more)
branches diverge. Branches and nodes can be internal or external
(terminal). An internal node corresponds to the hypothetical last
common ancestor (LCA) of everything arising from it. Terminal
nodes correspond to the sequences from which the tree was
derived (also referred to as operational taxonomic units or ‘OTUs’).
Internal nodes represent hypothetical ancestral taxa
a b c d e f g h
root, root node
terminal (or tip) taxa
internal nodes
internal
branches
u
v
w
x
y
z
t
Terminal
branches
Parts of a phylogenetic tree
13
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Groups
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Types of Trees
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Tree Roots
Tree Roots
At the base of a phylogenetic tree is its ‘root’. This is the oldest point
in the tree, and it, in turn, implies the order of branching in the rest
of the tree; that is, who shares a more recent common ancestor with
whom. The only way to root a tree is with an ‘outgroup’, an external
point of reference. An outgroup is anything that is not a natural
member of the group of interest (i.e. the ‘ingroup’
Rooting
21
Woese 1987
Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016
Unrooted Tree of Life from Woese
23
ROOT
Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016
Unrooted Tree of Life from Woese
24
ROOT
MAJOR DEBATE/AMBIGUITIES
Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016
Alternative Position of Eukaryote Branch
25
ROOT
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Orthology vs. Paralogy
Orthology vs. Paralogy
Evolution is about homology; that is, the similarity due to common ancestry.
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Sequence Alignment
Refining Alignment
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
The methods for calculating phylogenetic trees fall into two general
categories. These are distance-matrix methods, also known as
clustering or algorithmic methods (e.g. UPGMA, neighbour-joining,
Fitch–Margoliash), and discrete data methods, also known as tree
searching methods (e.g. parsimony, maximum likelihood, Bayesian
methods)
Baldauf Main Topics
• Terminology
• Groups
• Trees
• Roots
• Homology
• Inferring Trees
! Step 1. Assembling a dataset
! Step 2. Multiple sequence alignment – the heart of the matter
! Step 3. Trees – methods, models and madness
! Step 4. Tests – telling the forest from the trees
! Step 5. Data presentation
Bootstrapping
Long branch attraction
39
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
Phylogenomics
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2
Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions
Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions
tion ary in form ation can be used to im -
prove fun ction al prediction s. Below, I
presen t an outlin e of on e such phylog-
enomic m eth od (see Fig. 1), an d I com -
pare th is m eth od to n on evolution ary
fun ction al prediction m eth ods. Th is
m eth od is based on a relatively sim ple
assum ption —because gen e fun ction s
ch an ge as a result of evolution , recon -
structin g th e evolution ary h istory of
gen es sh ould h elp predict th e fun ction s
of un ch aracterized gen es. Th e first step
is th e gen eration of a ph ylogen etic tree
represen tin g th e evolution ary h istory of
th e gen e of in terest an d its h om ologs.
Such trees are distin ct from clusters an d
oth er m ean s of ch aracterizin g sequen ce
sim ilarity because th ey are in ferred by
special tech n iques th at h elp con vert pat-
tern s of sim ilarity in to evolution ary re-
lation sh ips (see Swofford et al. 1996). Af-
ter th e gen e tree is in ferred, biologically
determ in ed fun ction s of th e various h o-
m ologs are overlaid on to th e tree. Fi-
n ally, th e structure of th e tree an d th e
relative ph ylogen etic position s of gen es
of differen t fun ction s are used to trace
th e h istory of fun ction al ch an ges, wh ich
is th en used to predict fun ction s of un -
ch aracterized gen es. More detail of th is
m eth od is provided below.
Identification of Homologs
Th e first step in studyin g th e evolution
of a particular gen e is th e iden tification
of h om ologs. As with sim ilarity-based
fun ction al prediction m eth ods, likely
h om ologs of a particular gen e are iden -
database
erated se
BLAST (A
fam ily is
ers), it m a
a subset
m ust be d
m igh t ac
th at wou
sis.
Alignment
Sequen ce
an alysis h
th e assign
Each col
align m en
acids or
m on evol
um n is tr
gen etic a
wh ich th
m ology
cluded (G
sion of ce
kn own as
gen etic m
n atory po
ated with
m an y seq
ages) are
th e evolu
with m as
Phylogene
For exten
atin g ph y
Table 1. Methods of Predicting
Gene Function When Homologs
Have Multiple Functions
Highest Hit
The uncharacterized gene is
assigned the function (or frequently,
the annotated function) of the gene
that is identified as the highest hit
by a similarity search program (e.g.,
Tomb et al. 1997).
Top Hits
Identify top 10+ hits for the
uncharacterized gene. Depending
on the degree of consensus of the
functions of the top hits, the query
sequence is assigned a specific
function, a general activity with
unknown specificity, or no function
(e.g., Blattner et al. 1997).
Clusters of Orthologous Groups
Genes are divided into groups of
orthologs based on a cluster
analysis of pairwise similarity scores
between genes from different
species. Uncharacterized genes are
assigned the function of
characterized orthologs (Tatusov et
al. 1997).
Phylogenomics
Known functions are overlaid onto
an evolutionary tree of all
homologs. Functions of
uncharacterized genes are predicted
by their phylogenetic position
relative to characterized genes (e.g.,
Eisen et al. 1995, 1997).
Insight/Outlook
Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions
greatly from m ore data, it is useful to
augm en t th is in itial list by usin g iden ti-
fied h om ologs as queries for furth er
m on ly used: parsim on y, distan ce, an d
m axim um likelih ood (Table 3), an d each
h as its advan tages an d disadvan tages. I
Table 2. Types of Molecular Homology
Homolog Genes that are descended from a common ancestor
(e.g., all globins)
Ortholog Homologous genes that have diverged from each other
after speciation events (e.g., human b- and chimp
b-globin)
Paralog Homologous genes that have diverged from each other
after gene duplication events (e.g., b- and g-globin)
Xenolog Homologous genes that have diverged from each other
after lateral gene transfer events (e.g., antibiotic
resistance genes in bacteria)
Positional homology Common ancestry of specific amino acid or nucleotide
positions in different genes
Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions
Eisen 1998 Major Topics
• Sequence Similarity, Homology, and Functional Predictions
• Identification of Homologs
• Alignment and Masking
• Phylogenetic Trees
• Functional Predictions
al. 1989). However, exam in ation of th e
percen t sim ilarity between m ycoplasm al
gen es an d th eir h om ologs in bacteria
does n ot clearly sh ow th is relation sh ip.
Th is is because m ycoplasm as h ave un -
dergon e an accelerated rate of m olecular
evolution relative to oth er bacteria.
Th us, a BLAST search with a gen e from
Bacillus subtilis (a low GC Gram -positive
species) will result in a list in wh ich th e
m ycoplasm a h om ologs (if th ey exist)
score lower th an gen es from m an y spe-
Table 3. Molecular Phylogenetic Methods
Method
Parsimony Possible trees are compared and each is given a score that is a reflection of the minimum number
of character state changes (e.g., amino acid substitutions) that would be required over
evolutionary time to fit the sequences into that tree. The optimal tree is considered to be the
one requiring the fewest changes (the most parsimonious tree).
Distance The optimal tree is generated by first calculating the estimated evolutionary distance between all
pairs of sequences. Then these distances are used to generate a tree in which the branch
patterns and lengths best represent the distance matrix.
Maximum likelihood Maximum likelihood is similar to parsimony methods in that possible trees are compared and
given a score. The score is based on how likely the given sequences are to have evolved in a
particular tree given a model of amino acid or nucleotide substitution probabilities. The optimal
tree is considered to be the one that has the highest probability.
Bootstrapping Alignment positions within the original multiple sequence alignment are resampled and new data
sets are made. Each bootstrapped data set is used to generate a separate phylogenetic tree and
the trees are compared. Each node of the tree can be given a bootstrap percentage indicating
how frequently those species joined by that node group together in different trees. Bootstrap
percentage does not correspond directly to a confidence limit.
Insight/Outlook
MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

More Related Content

PDF
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
PDF
MIB200A at UCDavis Module: Microbial Phylogeny; Class 1
DOCX
Zoology holidaybreak review
PPTX
Modeling evolution in the classroom: The case of Fukushima’s mutant butterflies
PDF
Pathogen Genome Data
PDF
Lecture 02 (2 04-2021) phylogeny
PDF
Smita T. Gosain Resume.PDF
DOC
2.3 Phylogenetic Trees
MIB200A at UCDavis Module: Microbial Phylogeny; Class 3
MIB200A at UCDavis Module: Microbial Phylogeny; Class 1
Zoology holidaybreak review
Modeling evolution in the classroom: The case of Fukushima’s mutant butterflies
Pathogen Genome Data
Lecture 02 (2 04-2021) phylogeny
Smita T. Gosain Resume.PDF
2.3 Phylogenetic Trees

What's hot (11)

PPSX
Nikon Small World, Photography Competition 2015
PPTX
Web Apollo: Lessons learned from community-based biocuration efforts.
PPTX
The science of biology
PPTX
BiPday 2014 -- Vicario Saverio
PDF
Lecture 01 (2 02-2021) slides
PPTX
Sc. fair research paper (2012 2013)
PDF
First Pages Cotner 907-6
PPTX
characteristics used in classification of micro-organism
DOCX
BIO 280 Technology levels--snaptutorial.com
DOC
10. monica rivera 4th seminarreflectionoctober29
DOC
Unit 7 unit at a glance
Nikon Small World, Photography Competition 2015
Web Apollo: Lessons learned from community-based biocuration efforts.
The science of biology
BiPday 2014 -- Vicario Saverio
Lecture 01 (2 02-2021) slides
Sc. fair research paper (2012 2013)
First Pages Cotner 907-6
characteristics used in classification of micro-organism
BIO 280 Technology levels--snaptutorial.com
10. monica rivera 4th seminarreflectionoctober29
Unit 7 unit at a glance
Ad

Similar to MIB200A at UCDavis Module: Microbial Phylogeny; Class 2 (20)

PPTX
Phtogenetics ppt for basics understanding
PPT
Phylogenetic alignment analysis an important tool in computational biology
PPT
lecture.ppt..........................................
PPTX
Bioinformatics presentation shabir .pptx
PDF
phylogenetics.pdf
PPT
Basics of constructing Phylogenetic tree.ppt
PDF
Phylolecture
PPTX
Parsimony methods
PPTX
Phylogenetic tree by Dr. Amrita Saxena.pptx
PPTX
phylogenetic tree.pptx
PDF
evolution and molecular phylogenetics.pdf
PPTX
Computational phylogenetics theoretical concepts, methods with practical on C...
PPTX
Phylogeny-Abida.pptx
DOCX
Report on Phylogenetic tree
PDF
07_Phylogeny_2022.pdf
PPT
Topic 7 Phylogeny.ppt
PPT
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
PPTX
Phylogenetic tree construction step by step
PDF
Methods of illustrating evolutionary relationship
PDF
Plant Phylogeny Lab
Phtogenetics ppt for basics understanding
Phylogenetic alignment analysis an important tool in computational biology
lecture.ppt..........................................
Bioinformatics presentation shabir .pptx
phylogenetics.pdf
Basics of constructing Phylogenetic tree.ppt
Phylolecture
Parsimony methods
Phylogenetic tree by Dr. Amrita Saxena.pptx
phylogenetic tree.pptx
evolution and molecular phylogenetics.pdf
Computational phylogenetics theoretical concepts, methods with practical on C...
Phylogeny-Abida.pptx
Report on Phylogenetic tree
07_Phylogeny_2022.pdf
Topic 7 Phylogeny.ppt
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Phylogenetic tree construction step by step
Methods of illustrating evolutionary relationship
Plant Phylogeny Lab
Ad

More from Jonathan Eisen (20)

PDF
Eisen.CentralValley2024.pdf
PDF
Phylogenomics and the Diversity and Diversification of Microbes
PDF
Talk by Jonathan Eisen for LAMG2022 meeting
PDF
Thoughts on UC Davis' COVID Current Actions
PDF
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
PDF
A Field Guide to Sars-CoV-2
PDF
EVE198 Summer Session Class 4
PDF
EVE198 Summer Session 2 Class 1
PDF
EVE198 Summer Session 2 Class 2 Vaccines
PDF
EVE198 Spring2021 Class1 Introduction
PDF
EVE198 Spring2021 Class2
PDF
EVE198 Spring2021 Class5 Vaccines
PDF
EVE198 Winter2020 Class 8 - COVID RNA Detection
PDF
EVE198 Winter2020 Class 1 Introduction
PDF
EVE198 Winter2020 Class 3 - COVID Testing
PDF
EVE198 Winter2020 Class 5 - COVID Vaccines
PDF
EVE198 Winter2020 Class 9 - COVID Transmission
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
PDF
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction
Eisen.CentralValley2024.pdf
Phylogenomics and the Diversity and Diversification of Microbes
Talk by Jonathan Eisen for LAMG2022 meeting
Thoughts on UC Davis' COVID Current Actions
Phylogenetic and Phylogenomic Approaches to the Study of Microbes and Microbi...
A Field Guide to Sars-CoV-2
EVE198 Summer Session Class 4
EVE198 Summer Session 2 Class 1
EVE198 Summer Session 2 Class 2 Vaccines
EVE198 Spring2021 Class1 Introduction
EVE198 Spring2021 Class2
EVE198 Spring2021 Class5 Vaccines
EVE198 Winter2020 Class 8 - COVID RNA Detection
EVE198 Winter2020 Class 1 Introduction
EVE198 Winter2020 Class 3 - COVID Testing
EVE198 Winter2020 Class 5 - COVID Vaccines
EVE198 Winter2020 Class 9 - COVID Transmission
EVE198 Fall2020 "Covid Mass Testing" Class 8 Vaccines
EVE198 Fall2020 "Covid Mass Testing" Class 2: Viruses, COIVD and Testing
EVE198 Fall2020 "Covid Mass Testing" Class 1 Introduction

Recently uploaded (20)

PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
PPTX
Probability.pptx pearl lecture first year
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PDF
Wound infection.pdfWound infection.pdf123
PDF
Packaging materials of fruits and vegetables
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPTX
Substance Disorders- part different drugs change body
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPT
LEC Synthetic Biology and its application.ppt
PPTX
perinatal infections 2-171220190027.pptx
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
Microbes in human welfare class 12 .pptx
PPTX
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Science Form five needed shit SCIENEce so
PPT
Computional quantum chemistry study .ppt
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
Probability.pptx pearl lecture first year
Animal tissues, epithelial, muscle, connective, nervous tissue
Wound infection.pdfWound infection.pdf123
Packaging materials of fruits and vegetables
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Substance Disorders- part different drugs change body
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
LEC Synthetic Biology and its application.ppt
perinatal infections 2-171220190027.pptx
Hypertension_Training_materials_English_2024[1] (1).pptx
Microbes in human welfare class 12 .pptx
ap-psych-ch-1-introduction-to-psychology-presentation.pptx
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Science Form five needed shit SCIENEce so
Computional quantum chemistry study .ppt
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...

MIB200A at UCDavis Module: Microbial Phylogeny; Class 2

  • 1. Class 2: MIB200 Biology of Organisms without Nuclei Class #2: Phylogeny UC Davis, Fall 2019 Instructor: Jonathan Eisen 1
  • 4. Some Questions • What is a phylogenetic tree? • What can be shown in a phylogenetic tree? • How does one infer a phylogenetic tree? • How does one know if a tree is correct? • How can one use phylogenetic trees? • What is the difference between a gene tree and a species tree?
  • 7. Raff J. How to Read and Understand a Scientific Article 1. Begin by reading the introduction, not the abstract. https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf 2. Identify the big question. 3. Summarize the background in five sentences or less. 4. Identify the specific question(s). 5. Identify the approach. 6. Read the methods section. 7. Read the results section. 8. Determine whether the results answer the specific question(s). 9. Read the conclusion/discussion/interpretation section. 10. Go back to the beginning and read the abstract. 11. Find out what other researchers say about the paper.
  • 8. Raff J. How to Read and Understand a Scientific Article 1. Begin by reading the introduction, not the abstract. https://violentmetaphors.files.wordpress.com/2018/01/how-to-read-and-understand-a-scientific-article.pdf 2. Identify the big question. 3. Summarize the background in five sentences or less. 4. Identify the specific question(s). 5. Identify the approach. 6. Read the methods section. 7. Read the results section. 8. Determine whether the results answer the specific question(s). 9. Read the conclusion/discussion/interpretation section. 10. Go back to the beginning and read the abstract. 11. Find out what other researchers say about the paper. X
  • 10. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 11. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 12. A phylogenetic tree is composed of branches (edges) and nodes. Branches connect nodes; a node is the point at which two (or more) branches diverge. Branches and nodes can be internal or external (terminal). An internal node corresponds to the hypothetical last common ancestor (LCA) of everything arising from it. Terminal nodes correspond to the sequences from which the tree was derived (also referred to as operational taxonomic units or ‘OTUs’).
  • 13. Internal nodes represent hypothetical ancestral taxa a b c d e f g h root, root node terminal (or tip) taxa internal nodes internal branches u v w x y z t Terminal branches Parts of a phylogenetic tree 13
  • 14. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 16. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 18. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 20. Tree Roots At the base of a phylogenetic tree is its ‘root’. This is the oldest point in the tree, and it, in turn, implies the order of branching in the rest of the tree; that is, who shares a more recent common ancestor with whom. The only way to root a tree is with an ‘outgroup’, an external point of reference. An outgroup is anything that is not a natural member of the group of interest (i.e. the ‘ingroup’
  • 23. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Unrooted Tree of Life from Woese 23 ROOT
  • 24. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Unrooted Tree of Life from Woese 24 ROOT MAJOR DEBATE/AMBIGUITIES
  • 25. Slides by Jonathan Eisen for BIS2C at UC Davis Spring 2016 Alternative Position of Eukaryote Branch 25 ROOT
  • 26. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 28. Orthology vs. Paralogy Evolution is about homology; that is, the similarity due to common ancestry.
  • 29. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 30. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 32. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 35. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 36. The methods for calculating phylogenetic trees fall into two general categories. These are distance-matrix methods, also known as clustering or algorithmic methods (e.g. UPGMA, neighbour-joining, Fitch–Margoliash), and discrete data methods, also known as tree searching methods (e.g. parsimony, maximum likelihood, Bayesian methods)
  • 37. Baldauf Main Topics • Terminology • Groups • Trees • Roots • Homology • Inferring Trees ! Step 1. Assembling a dataset ! Step 2. Multiple sequence alignment – the heart of the matter ! Step 3. Trees – methods, models and madness ! Step 4. Tests – telling the forest from the trees ! Step 5. Data presentation
  • 43. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  • 44. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  • 45. tion ary in form ation can be used to im - prove fun ction al prediction s. Below, I presen t an outlin e of on e such phylog- enomic m eth od (see Fig. 1), an d I com - pare th is m eth od to n on evolution ary fun ction al prediction m eth ods. Th is m eth od is based on a relatively sim ple assum ption —because gen e fun ction s ch an ge as a result of evolution , recon - structin g th e evolution ary h istory of gen es sh ould h elp predict th e fun ction s of un ch aracterized gen es. Th e first step is th e gen eration of a ph ylogen etic tree represen tin g th e evolution ary h istory of th e gen e of in terest an d its h om ologs. Such trees are distin ct from clusters an d oth er m ean s of ch aracterizin g sequen ce sim ilarity because th ey are in ferred by special tech n iques th at h elp con vert pat- tern s of sim ilarity in to evolution ary re- lation sh ips (see Swofford et al. 1996). Af- ter th e gen e tree is in ferred, biologically determ in ed fun ction s of th e various h o- m ologs are overlaid on to th e tree. Fi- n ally, th e structure of th e tree an d th e relative ph ylogen etic position s of gen es of differen t fun ction s are used to trace th e h istory of fun ction al ch an ges, wh ich is th en used to predict fun ction s of un - ch aracterized gen es. More detail of th is m eth od is provided below. Identification of Homologs Th e first step in studyin g th e evolution of a particular gen e is th e iden tification of h om ologs. As with sim ilarity-based fun ction al prediction m eth ods, likely h om ologs of a particular gen e are iden - database erated se BLAST (A fam ily is ers), it m a a subset m ust be d m igh t ac th at wou sis. Alignment Sequen ce an alysis h th e assign Each col align m en acids or m on evol um n is tr gen etic a wh ich th m ology cluded (G sion of ce kn own as gen etic m n atory po ated with m an y seq ages) are th e evolu with m as Phylogene For exten atin g ph y Table 1. Methods of Predicting Gene Function When Homologs Have Multiple Functions Highest Hit The uncharacterized gene is assigned the function (or frequently, the annotated function) of the gene that is identified as the highest hit by a similarity search program (e.g., Tomb et al. 1997). Top Hits Identify top 10+ hits for the uncharacterized gene. Depending on the degree of consensus of the functions of the top hits, the query sequence is assigned a specific function, a general activity with unknown specificity, or no function (e.g., Blattner et al. 1997). Clusters of Orthologous Groups Genes are divided into groups of orthologs based on a cluster analysis of pairwise similarity scores between genes from different species. Uncharacterized genes are assigned the function of characterized orthologs (Tatusov et al. 1997). Phylogenomics Known functions are overlaid onto an evolutionary tree of all homologs. Functions of uncharacterized genes are predicted by their phylogenetic position relative to characterized genes (e.g., Eisen et al. 1995, 1997). Insight/Outlook
  • 46. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  • 47. greatly from m ore data, it is useful to augm en t th is in itial list by usin g iden ti- fied h om ologs as queries for furth er m on ly used: parsim on y, distan ce, an d m axim um likelih ood (Table 3), an d each h as its advan tages an d disadvan tages. I Table 2. Types of Molecular Homology Homolog Genes that are descended from a common ancestor (e.g., all globins) Ortholog Homologous genes that have diverged from each other after speciation events (e.g., human b- and chimp b-globin) Paralog Homologous genes that have diverged from each other after gene duplication events (e.g., b- and g-globin) Xenolog Homologous genes that have diverged from each other after lateral gene transfer events (e.g., antibiotic resistance genes in bacteria) Positional homology Common ancestry of specific amino acid or nucleotide positions in different genes
  • 48. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  • 49. Eisen 1998 Major Topics • Sequence Similarity, Homology, and Functional Predictions • Identification of Homologs • Alignment and Masking • Phylogenetic Trees • Functional Predictions
  • 50. al. 1989). However, exam in ation of th e percen t sim ilarity between m ycoplasm al gen es an d th eir h om ologs in bacteria does n ot clearly sh ow th is relation sh ip. Th is is because m ycoplasm as h ave un - dergon e an accelerated rate of m olecular evolution relative to oth er bacteria. Th us, a BLAST search with a gen e from Bacillus subtilis (a low GC Gram -positive species) will result in a list in wh ich th e m ycoplasm a h om ologs (if th ey exist) score lower th an gen es from m an y spe- Table 3. Molecular Phylogenetic Methods Method Parsimony Possible trees are compared and each is given a score that is a reflection of the minimum number of character state changes (e.g., amino acid substitutions) that would be required over evolutionary time to fit the sequences into that tree. The optimal tree is considered to be the one requiring the fewest changes (the most parsimonious tree). Distance The optimal tree is generated by first calculating the estimated evolutionary distance between all pairs of sequences. Then these distances are used to generate a tree in which the branch patterns and lengths best represent the distance matrix. Maximum likelihood Maximum likelihood is similar to parsimony methods in that possible trees are compared and given a score. The score is based on how likely the given sequences are to have evolved in a particular tree given a model of amino acid or nucleotide substitution probabilities. The optimal tree is considered to be the one that has the highest probability. Bootstrapping Alignment positions within the original multiple sequence alignment are resampled and new data sets are made. Each bootstrapped data set is used to generate a separate phylogenetic tree and the trees are compared. Each node of the tree can be given a bootstrap percentage indicating how frequently those species joined by that node group together in different trees. Bootstrap percentage does not correspond directly to a confidence limit. Insight/Outlook