Introduction to prokaryotes Introduction to prokaryotes

Introduction to prokaryotes
Introduction and Classification
K R MICRO NOTES 1

Natural system of classification, binomial nomenclature
• The grouping of related organisms together is the basis of classification.
• Reasons for classification:
• 1. to establish the criteria for identifying organisms.
• 2. to arrange related organisms into groups, and
• 3. to provide important information on how organisms evolved.
• Taxonomy (taxis: arrangement or order and nomos- law- or nemein- to distribute or govern) is defined as the
science of biological classification- can be called systemics.
• It consists of 3 parts: classification, nomenclature, and identification.
• Once a classification scheme is selected, it is used to arrange organisms into groups called taxa(taxon-
category of organisms) based on mutual similarity.
• Nomenclature is the branch of taxonomy concerned with the assignment of names to taxonomic groups in
agreement with published rules.
• Identification is the practical side of taxonomy- process which determines if a particular isolate belongs to any
taxa.
• Taxonomists define systemics as scientific study of organisms with ultimate objective of characterizing and
arranging them in orderly manner.
• Systemics has disciplines such as morphology, ecology, epidemiology, biochemistry, genetics, molecular
biology and physics.
K R MICRO NOTES 2

• The eighteenth- century Swedish botanist Carolus Linnaeus is credited with
founding the science of taxonomy.
• He originated binomial nomenclature- one of the oldest classification systems,
called natural classification, arranges organisms into groups whose members share
many characteristics.
• It was based on anatomical characteristics.- this natural classification is applied to
higher organisms based on morphology characteristics and evolutionary relationship
can be established through this.
• Determining the genus and species of a newly isolated microbe is based on
polyphasic taxonomy. This approach includes phenotypics, phylogenetic and
genotypic features.
• Linnaeus also established a hierarchy of taxonomic ranks
• Microbes are placed in hierarchical taxonomic levels, with each level or rank sharing
a common set of specific features.
• The highest rank is the domain. Within each domain, each organism is assigned in
descending order, to a phylum, class, order, family, genus and species.
• Some microbes have subspecies designation and subphylum also.
K R MICRO NOTES 3

• Species can be defined as a collection of strains that share many stable properties and differ significantly from other
groups of strains.
• A strain consists of the descendants of a single, pure microbial culture.
• A strain is made up of the descendants of a single isolation in pure culture. A strain is usually made up of a succession of
cultures and is often derived from a single colony. The number of bacteria which gave rise to the original colony is often
unknown. Most bacterial strains are not known to be clones.
• Strains within a species may be described in a number of different ways.
• Biovars are variant strains characterized by biochemical or physiological differences.
• Morphovars differ morphologically and serovars have distinct antigenic properties.
• For each species. One strain is designated as type strain which is studied first and fully characterized than others.- its is
considered as type species and gets the species name.
• Only those strains very similar to the type strain or type species are included in a species.
• Each species is assigned to a genus, the next rank in taxonomic hierarchy.- genus is a well defined group of one or more
species that is clearly separate from other genera.
• A culture of bacteria is a population of bacterial cells in a given place at a given time, e.g., in this test tube or on that agar
plate. It may have a longer duration, e.g., desiccated cultures. A clone is a population of bacterial cells derived from a
single parent cell
• Microbiologists name microorganisms by using binomial system of Linnaeus.
• The italicized name consists of 2 parts- the first part is the generic name – genus, and the 2nd is species name- species
name is stable.
• To be recognised as a new species, genomic, metabolic, morphological, reproductive and ecological data must be
accepted and published in the International Journal of Systematic and Evolutionary Microbiology.
• Bergey’s Manual of Systematic Bacteriology consists only recognized bacterial and archaeal species.
K R MICRO NOTES 6

INTERNATIONAL CODE OF NOMENCLATURE OF PROKARYOTES
• The International Committee on Systematics of Prokaryotes (ICSP) and the International Code of
Nomenclature of Bacteria are responsible for the naming of prokaryotes, including both eubacteria and
archaebacteria or archaea.
• In the 19th century and the first half of the 20th century, bacteriologists tried to follow the provisions of the
Botanical Code of Nomenclature, because bacteria had traditionally been considered fungi, the Schizomycetes.
• Methods of study were, however, very different. Also, much emphasis had to be put on cultural characteristics,
so that type cultures were of critical importance.
• Type cultures are not permitted under the Botanical Code; therefore, at the First International Congress of
Microbiology in Paris in 1930, proposals were made for bacteriology to establish its own Code of Nomenclature.
• A committee under the able guidance of the American bacteriologist R. E. Buchanan began work on this and, at
the Second Congress in London in 1936, a draft Code was presented and placed under the aegis of the
International Committee for Bacteriological Nomenclature [later, the International Committee on Systematic
Bacteriology (ICSB), and now, the ICSP]. Buchanan condensed the provisions for nomenclature into a few
broad principles that are still valid today:
• 1.Names should be stable.This is assured by retaining the first name to be published, the principle of priority.
• 2.Names should be unambiguous.This is assured by establishing type cultures, which can be referred to
whenever there is doubt about the status of a novel bacterium. Type cultures are not necessarily completely
typical, but they function as indispensable points of reference.
• 3.Names should be necessary.This is assured by publication of descriptions of the organisms and the rejection
of names that are superfluous.
K R MICRO NOTES 8

Criteria used for classification- types of systems
• Phenetic classification: system which classifies organisms according
to mutual similarity of their phenotypic characteristics- they will make
a single group or taxa.
• Phylogenetic classification: or phyletic classification systems,
compares organisms on the basis of evolutionary relationships.
Phylogeny refers to evolutionary development of a species.
• Genotypic classification: compares the genetic similarity between
organisms.
K R MICRO NOTES 9

R. H. Whittaker and other
taxonomists worked towards
development of the Five-
kingdom classification
system
K R MICRO NOTES 10

Three domain classification
• Carl Woese and G.E Fox suggested this system of classification which is widely
accepted.
• Domain is the highest taxonomic rank in the hierarchical biological
classification system, above the kingdom level. There are three domains of
life, the Archaea, the Bacteria, and the Eucarya. Organisms from Archaea
and Bacteria have a prokaryotic cell structure, whereas organisms from the
domain Eucarya (eukaryotes) encompass cells with a nucleus confining the
genetic material from the cytoplasm.
• The term “domain” was introduced by Carl R. Woese et al. (1990) together
with the proposal of a natural classification system for all life on Earth,
including microorganisms, which had previously escaped any attempt of
classification based on evolutionary relationships (Woese et al. 1990).
Woese’s proposal to class life in three major phylogenetic domains was an
attempt to institutionalize the three major phylogenetic groupings of
organisms that he had previously observed and defined
K R MICRO NOTES 12

Classification according to Bergey’s manual of systematic
bacteriology
• In 1923 David Bergey, professor of bacteriology at the university of Pennsylvania. And 4 colleagues published Bergey’s
Manual of Determinative Bacteriology, a classification of bacteria that could be used for the identification of bacterial
species.
• It continues to serve as a relatively brief reference guide in the identification of bacteria based on physiological and
morphological traits.
• Bergey’s Manual of Determinative Bacteriology Provides identification schemes for identifying bacteria and archaea
Morphology, differential staining, biochemical tests.
• The first edition of Bergey’s Manual of Systematic Bacteriology, which came out in four volumes from 1984 through
1989, attempted to organize bacterial species according to known phylogenetic relationships, an approach that
continued with a second edition, published in five volumes from 2001 through 2012.
• A four-volume first edition of BMSB provides descriptions and photographs of species, tests to distinguish among
genera and species, DNA relatedness among organisms, and various numerical taxonomy studies.
• Characteristics used for classification in edition 1: • General shape & morphology • Gram staining properties • Oxygen
relationship • Motility • Presence of endospores • Mode of energy production
• A 5-volume second edition of BMSB covers a specific group of microbes. The morphology, physiology, growth
conditions, ecology and other information is provided making this a valuable reference for microbiologists.
• Microbial classification in the first edition was phenetic- based on phenotypic characterization.
• Classification in the second edition is largely phylogenetic. It also has more ecological information about individual taxa.
Comparisons of nucleic acid sequences, particularly 16s rRNA sequences are the foundation of this classification.
K R MICRO NOTES 16

Comparison of proteins
Proteins are biological molecules made up of building blocks called amino acids. Proteins are essential to life, with structural, metabolic,
transport, immune, signaling and regulatory functions among many other roles.
Proteome- It is a blanket term that refers to all of the proteins that an organism can express. Each species has its own, unique proteome.
K R MICRO NOTES 31

• 1. Antibody-based methods
Techniques such as ELISA (enzyme-linked immunosorbent assay) and western blotting rely on the availability of antibodies
targeted toward specific proteins or epitopes to identify proteins and quantify their expression levels.
• 2. Gel-based methods
• Two-dimensional gel electrophoresis (2DE or 2D-PAGE), the first proteomic technique developed, uses an electric current
to separate proteins in a gel based on their charge (1st dimension) and mass (2nd dimension). Differential gel
electrophoresis (DIGE) is a modified form of 2DE that uses different fluorescent dyes to allow the simultaneous
comparison of two to three protein samples on the same gel. These gel-based methods are used to separate proteins
before further analysis by e.g., mass spectrometry (MS), as well as for relative expression profiling.
• Polyacrylamide gel electrophoresis of bacterial proteins has been used as an efficient technique for the classification of
microorganisms, based on phenotipical characteristics expressed by their protein profiles. Bacterial groupings based on
electrophoretic profile correlate very well with the results obtained by the DNA hybridization
• 3. Chromatography-based methods
• Chromatography-based methods can be used to separate and purify proteins from complex biological mixtures such as cell
lysates. For example, ion-exchange chromatography separates proteins based on charge, size exclusion chromatography
separates proteins based on their molecular size, and affinity chromatography employs reversible interactions between
specific affinity ligands and their target proteins (e.g., the use of lectins for purifying IgM and IgA molecules). These
methods can be used to purify and identify proteins of interest, as well as to prepare proteins for further analysis by e.g.,
downstream MS.
• The zymogram technique, which consists of zone electrophoresis followed by staining in situ for specific enzyme activity
(10), provides a sensitive method for the detection of minor structural differences among enzymes.
• Electrophoretic comparisons of homologous enzymes among related species have been utilized, in a number of isolated
instances, for clarification of certain classification questions in bacteria and in higher species.
• A systematic application of the zymogram method as a primary taxonomic tool requires that a significant number of
enzymes be examined. K R MICRO NOTES 32

• Classification Of Microbes On The Basis Of Genotypic Characters
• Genotypic identification is emerging as an alternative or complement to establish phenotypic methods. The characterization of the
organisms can also be done utilizing the genotypic properties. As discussed earlier, several kinds of analysis performed upon isolated
nucleic acids furnish information about the genotype, the analysis of the base composition of DNA, the study of chemical
hybridization between nucleic acids isolated from different organisms, and the sequencing of nucleic acids. 16S rRNA sequence–
based methods, DNA base ratio and DNA hybridization offer a viable option for the rapid and reliable identification.
• Dna Base Ratio (G+C Ratio)
• DNA base composition can only prove that organisms are unrelated. The ratio of bases in DNA can vary over a wide range. If two
organisms have different DNA base compositions, they are not related. However, organisms with identical base ratios are not
necessarily related, because the nucleotide sequences in the two organisms could be completely different.
• In molecular biology, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases on a DNA molecule which
are either guanine or cytosine (from a possibility of four different ones, also including adenine and thymine). This may refer to a
specific fragment of DNA or RNA, or that of the whole genome. When it refers to a fragment of the genetic material, it may denote
the GC content of part of a gene (domain), single gene, group of genes (or gene clusters) or even a non-coding region. G (guanine)
and C (cytosine) undergo a specific hydrogen bonding whereas A (adenine) bonds specifically with T (thymine).
• The GC pair is bound by three hydrogen bonds, while AT pairs are bound by two hydrogen bonds. DNA with high GC-content is
more stable than DNA with low GC-content, but contrary to popular belief, the hydrogen bonds do not stabilize the DNA
significantly and stabilization is mainly due to stacking interactions. In spite of the higher thermostability conferred to the genetic
material, it is envisaged that cells with DNA with high GC-content undergo autolysis, thereby reducing the longevity of the cell per
se. Due to the robustness endowed to the genetic materials in high GC organisms it was commonly believed that the GC content
played a vital part in adaptation temperatures, a hypothesis which has recently been refuted.
• In PCR experiments, the GC-content of primers are used to predict their annealing temperature to the template DNA. A higher GC-
content level indicates a higher melting temperature.
• GC content is usually expressed as a percentage value, but sometimes as a ratio (called G+C ratio or GC-ratio). GC-content
percentage is calculated as
K R MICRO NOTES 35

The GC-content percentages as well as GC-ratio can be measured by several means but one of the simplest methods is to
measure what is called the melting temperature of the DNA double helix using spectrophotometry. The absorbance of DNA at
a wavelength of 260 nm increases fairly sharply when the double-stranded DNA separates into two single strands when
sufficiently heated. The most commonly used protocol for determining GC ratios uses flow cytometry for large number of
samples.
GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by
variation in selection, mutational bias and biased recombination-associated DNA repair. The species problem in prokaryotic
taxonomy has led to various suggestions in classifying bacteria and the adhoc committee on reconciliation of approaches to
bacterial systematics has recommended use of GC ratios in higher level hierarchical classification.
K R MICRO NOTES 36

PCR
• Historically, identification and classification of eubacteria are based on phenotypic characteristics. Early
molecular techniques used in bacterial classification were based on GC content, plasmid profiling, and
compatibility to genetic transformation.
• Currently, two fundamental molecular applications are being extensively utilized in bacterial detection and
identification; these are based on hybridization and nucleotide sequencing. Hybridization based applications
such as Southern, PCR, real-time PCR, microarray, universal tagging method, and loop-mediated isothermal
amplification (LAMP) are sensitive and specific techniques for the detection and identification of microbes.
Specific PCR primers have been employed to confirm the presence or absence of target microorganisms or
specific features associated with them such as antibiotic resistance and virulence factors.
• Specific primers proved useful in assessing clinical samples for the presence of slow growing bacteria such
as Mycobacterium tuberculosis and Helicobacter pylori.
• Although specific primers are powerful tools showing superior sensitivities and specificities, yet they can not
predict the presence or absence of non-target bacterial species in the tested sample.
• To solve this problem, investigators sought universal primers. Unfortunately, “Universal Primers” do not live
up to their name since they do not cover all bacteria.
• Multiplexing employs specific primer pairs, it is a time saving process that allows the simultaneous detection
of a limited number of target microbes. However, multiplexing with specific primers will miss non-target
microbes.
K R MICRO NOTES 40

Serological methods
• Serological tests can be used to detect viral & bacterial antigens
and antibodies (IgG and IgM), to help diagnose diseases and
check immune status. A range of techniques are utilized
including ELISA, chemiluminescence, agglutination, direct and
indirect immunofluorescence, and Western blotting.
• The high specificity of serological reactions is determined by the
chemical nature of the antigen, and it is possible to detect
differences between complex molecules, particularly proteins,
which cannot yet be distinguished by chemical analysis.
Serology is therefore a delicate tool for comparing and
contrasting antigenic components of the microbial cell, providing
information of use both in identification and classification.
K R MICRO NOTES 44

Taxonomy based on genetic methods
• The use of genetic data for taxonomic purposes may be justified on either theoretical or empirical
grounds, but in neither case can such data be regarded as completely reliable indicators of
taxonomic relations. Strictly, all conclusions from genetic evidence (or phenetic evidence, for that
matter) can only be expressed as probability statements. It will become clear from subsequent
sections that the theoretical bases of gene transfer evidence are extremely slender (although
theoretical bases are strong for certain other classes of data, in particular DNA pairing).
• gene transfer is very much more frequent between strains that are closely related taxonomically
than between strains that are distantly related. At the very least, the occurrence of gene transfer is
a pointer for the taxonomist to organisms that he should study together.
• Three main classes of mechanism for gene transfer are known (although the borders between
them may not be entirely sharp): those where genes are transferred as soluble DNA molecules,
those involving transfer by a cellular particle, and those involving cell contact followed by transfer
of the whole or part of the bacterial chromosome. In all cases (with the exception of RNA phages)
the essential step is the transfer of DNA in some form, and except perhaps in the recombination
systems of Actinomycetales it is effectively unidirectional.
• Bacterial genes can be transferred across wide taxonomic gaps; no sharp distinction can be made
between bacterial genes and genes of phages and plasmids; there is a growing view that the
bacterial genome is best regarded as consisting of a number of different replicons (chromosome
and plasmids) some of which are unstable and are continually merging
K R MICRO NOTES 45

Bacterial phylogeny
• The branching pattern of ancestor–descendant relationships among ‘taxa’
(e.g., species or their genes) is called a ‘phylogeny’. ‘Phylogenetics’ is the
process of attempting to estimate these historical relationships by examining
information such as DNA, protein sequences, or morphological (shape)
characters from extant taxa. This information is generally presented using a
mathematical tree – a structure used to describe the evolutionary history of the
taxa at a high level. These trees come in several different varieties and can be
inferred in several different ways. There is a great amount of effort being put
into methods of estimating trees, as well as determining particular phylogenies
for species of interest.
• Phylogeny provides another perspective on biodiversity that allows an
objective way to compare uniqueness and diversity of taxa. Although various
specific measures of phylogenetic diversity have been proposed, most share a
basic approach by which phylogenetic trees are used to evaluate species
richness in concordant groups.
• Microbial phylogenetics is the study of the evolutionary relatedness among
various groups of microorganisms. The molecular approach to microbial
phylogenetic analysis revolutionized our thinking about evolution in the
K R MICRO NOTES 47

• A phylogenetic tree, also known as a phylogeny, is a diagram that depicts the lines of
evolutionary descent of different species, organisms, or genes from a common
ancestor. Phylogenies are useful for organizing knowledge of biological diversity, for
structuring classifications, and for providing insight into events that occurred during
evolution. Furthermore, because these trees show descent from a common ancestor,
and because much of the strongest evidence for evolution comes in the form of
common ancestry
• Phylogenetic trees represent hypotheses about the evolutionary relationships among
a group of organisms.
• Phylogenetic trees are characterized by a series of branching points leading from the
‘root’ or common ancestor of the species up to the tips or contemporary organisms.
• A phylogenetic tree may be built using morphological (body shape), biochemical,
behavioral, or molecular features of species or other groups.
• In building a tree, we organize species into nested groups based on shared derived
traits (traits different from those of the group's ancestor).
• The sequences of genes or proteins can be compared among species and used to build
phylogenetic trees. Closely related species typically have few sequence differences,
while less related species tend to have more.
K R MICRO NOTES 48

• Phylogenetic trees, by analogy to botanical trees, are made of
leaves, nodes, and branches
• It is a branching diagram composed of nodes and branches. The
branching pattern of a tree is called the topology of the tree.
• The nodes represent taxonomic units, such as species (or higher
taxa), populations, genes, or proteins.
• A branch is called an edge, and represents the time estimate of the
evolutionary relationships among the taxonomic units. One branch
can connect only two nodes. In a phylogenetic tree, the terminal
nodes represent the operational taxonomic units (OTUs) or leaves.
• The OTUs are the actual objects—such as the species, populations,
or gene or protein sequences—being compared, whereas the
internal nodes represent hypothetical taxonomic units (HTUs).
• An HTU is an inferred unit and it represents the last common
ancestor (LCA) to the nodes arising from this point. Descendants
(taxa) that split from the same node form sister groups, and a taxon
that falls outside the cladea is called an outgroup.
K R MICRO NOTES 49

• Phylogenetic trees, by analogy to botanical trees, are made of leaves,
nodes, and branches
• The leaves of a tree, also called tips, can be species,
populations, individuals, or even genes. If the tips
represent a formally named group, they are called taxa
(singular: taxon).
• A ‘taxon’ is a group of organisms at any hierarchical
rank, such as a family, genus, or species.
• The tips of a phylogenetic tree are most commonly
living, but may also represent the ends of extinct
lineages or fossils.
• As in the trees you are already familiar with, tips or
leaves are subtended by branches.
• A branch, which represents the persistence of a lineage
through time, may subtend one or many leaves.
Branches connect to other branches at nodes, which
represents the last common ancestors of organisms at
the tips of the descendant lineages.
• A branch connecting a tip to a node is called an
external branch, whereas one connecting two nodes is
called an internal branch
K R MICRO NOTES 50

• Reading a tree from the past toward the present, a node indicates a point where an ancestral lineage
(the branch below the node) split to give rise to two or more descendant lineages (the branches
above the node). Branching on an evolutionary tree is also called ‘cladogenesis’ or ‘lineage splitting.’
After a lineage splits into two, evolution happens independently in these newly formed descendant
lineages. The sequence of lineage splits in a tree creates its structure or ‘topology.’ Tree topology
shows us the branching of lineages through time that gave rise to the tips.
• ‘Clades’ are groupings on a tree that include a node and all of the lineages descended from that
node. The set of all the tips in a clade is defined as being ‘monophyletic,’ referring to the fact that it
includes all the descendants of an ancestral lineage.
we could say that the tree supports monophyly of taxa C,
D, and E or, put another way, C, D, and E together form a
clade. Clades can be hierarchically nested within one
another.
A tree’s topology can now be defined more precisely as
the set of clades that the tree contains.
K R MICRO NOTES 51

• Phylogenetic trees are either rooted or unrooted.
• The root of the phylogenetic tree is inferred to be the oldest point in the tree
and corresponds to the theoretical last common ancestor of all taxonomic
units included in the tree. The root gives directionality to evolution within the
tree.
• Accurate rooting of a phylogenetic tree is important for directionality of
evolution and increases the power of interpreting genetic changes between
sequences.
• Phylogenetic trees can be rooted or unrooted. A rooted tree has a node (the
root) from which the rest of the tree diverges. This root is frequently referred to
as the last universal common ancestor (LUCA), from which the other
taxonomic groups have descended and diverged over time. In
molecular phylogenetics, the LUCA and LCA are represented by DNA
or protein sequences. Obtaining a rooted tree is ideal, but most phylogenetic-
tree-reconstruction algorithms produce unrooted trees.
• Phylogenetic trees can be scaled or unscaled. In a scaled tree, the branch
length is proportional to the amount of evolutionary divergence (e.g. the
number of nucleotide substitutions) that has occurred along that branch. In an
unscaled tree, the branch length is not proportional to the amount of
K R MICRO NOTES 52

Different forms of presentation of the
phylogenetic tree.
The phylogenetic tree in D is a
dendrogram derived from hierarchical
clustering (see text). A, B, and D show
rooted trees, while C shows an unrooted
tree. Taxa that share specific derived
characters are grouped into clades.
(A) Smaller clades located within a larger
clade are called nested clades.
(B) The terminal nodes represent the
operational taxonomic units, also called
“leaves”; each terminal node could be a
taxon (species or higher taxa), or a gene
or protein sequence. The internal nodes
represent hypothetical taxonomic units.
An HTU represents the last common
ancestor to the nodes arising from this
point. Two descendants that split from the
same node are called sister groups and a
taxon that falls outside the clade is called
an outgroup. Rooted trees have a node
from which the rest of the tree diverges,
frequently called the last universal
common ancestor (LUCA).
K R MICRO NOTES 53

• The purpose of phylogenetic analysis is to understand the past
evolutionary path of organisms. Even though we will never know for
certain the true phylogeny of any organism, phylogenetic analysis
provides best assumptions, thereby providing a framework for various
disciplines in microbiology. Due to the technological innovation of
modern molecular biology and the rapid advancement in computational
science, accurate inference of the phylogeny of a gene or organism
seems possible in the near future. There has been a flood of nucleic
acid sequence information, bioinformatic tools and phylogenetic
inference methods in public domain databases, literature and worldwide
web space. Phylogenetic analysis has long played a central role in
basic microbiology, for example in taxonomy and ecology. In addition,
more recently emerging fields of microbiology, including comparative
genomics and phylogenomics, require substantial knowledge and
understanding of phylogenetic analysis and computational skills to
handle the large-scale data involved. Methods of phylogenetic analysis
and relevant computer software tools lend accuracy, efficiency and
availability to the task.
K R MICRO NOTES 54

• A phylogram is a scaled phylogenetic tree in which the branch lengths
are proportional to the amount of evolutionary divergence.
• A cladogram is a branching hierarchical tree that shows the
relationships between clades; cladograms are unscaled.
• The word dendrogram means a hierarchical cluster arrangement where
similar objects (based on some defined criteria) are grouped into
clusters; hence, a dendrogram shows the relationships among various
clusters.
• Dendrograms are also used outside the scope of phylogenetics and even
outside of biology. Dendrograms are frequently used in computational
molecular biology to illustrate the branching based on clustering of genes
or proteins.
K R MICRO NOTES 55

Homology
• Homologies are similarities of complex structures or patterns which
are caused by a continuity of biological information (in the sense of
instruction)“. Thus, it is believed that homologous characters are
based on the same biological information in the common ancestor.
• Homology depends on comparison between characters. Usually two
types of homology are considered, one (called “phylogenetic” or
“evolutionary”) between species, the other (called “serial”, “iterative”
or “homonomy”) within individuals. If we look closer, however, at
least four levels of comparison correlated with four types of
homology can be distinguished.
K R MICRO NOTES 56

• There are four steps
in
general phylogenetic
analysis of molecular
sequences: (i)
selection of a suitable
molecule or
molecules
(phylogenetic
marker), (ii)
acquisition of
molecular sequences,
(iii) multiple sequence
alignment (MSA) and
(iv) phylogenetic
treeing and
evaluation.
K R MICRO NOTES 59

Different methods of phylogenetic tree
construction
• Phylogenetics is the study of genetic relatedness of individuals of the same, or
different, species. Through phylogenetics, evolutionary relationships can be
inferred.
• A phylogenetic tree may be rooted or unrooted, depending on whether the
ancestral root is known or unknown, respectively.
• A phylogenetic tree’s root is the origin of evolution of the individuals studied.
• Branches between leaves show the evolutionary relationships between
sequences, individuals, or species, and branch length represents evolutionary
time.
• When constructing and analyzing phylogenetic trees, it is important to remember
that the resulting tree is simply an estimate and is unlikely to represent the true
evolutionary tree of life.
• Various methods can be used to construct a phylogenetic tree.
• The two most commonly used and most robust approaches are maximum
likelihood and Bayesian methods.
K R MICRO NOTES 60

• There are currently two main categories of tree-building methods, each having advantages and limitations.
• The first category is based on discrete characters, which are molecular sequences from individual taxa. The basic
assumption is that characters at corresponding positions in a multiple sequence alignment are homologous among
the sequences involved. Therefore, the character states of the common ancestor can be traced from this dataset.
Another assumption is that each character evolves independently and is therefore treated as an individual
evolutionary unit.
• The second category of phylogenetic methods is based on distance, which is the amount of dissimilarity between
pairs of sequences, computed on the basis of sequence alignment. The distance-based methods assume that all
sequences involved are homologous and that tree branches are additive, meaning that the distance between two
taxa equals the sum of all branch lengths connecting them.
• DISTANCE-BASED METHODS
• The algorithms for the distance-based tree-building method can be subdivided into either clustering based or
optimality based. The clustering-type algorithms compute a tree based on a distance matrix starting from the most
similar sequence pairs. These algorithms include an unweighted pair group method using arithmetic average
(UPGMA) and neighbor joining. The optimality-based algorithms compare many alternative tree topologies and select
one that has the best fit between estimated distances in the tree and the actual evolutionary distances. This category
includes the Fitch–Margoliash and minimum evolution algorithms.
• Clustering-Based Methods
• Unweighted Pair GroupMethod Using Arithmetic Average
• The simplest clustering method is UPGMA, which builds a tree by a sequential clustering method. Given a distance
matrix, it starts by grouping two taxa with the smallest pairwise distance in the distance matrix.
• The basic assumption of the UPGMA method is that all taxa evolve at a constant rate and that they are equally
distant from the root.
• Neighbor Joining
• The UPGMA method uses unweighted distances and assumes that all taxa have constant evolutionary rates.
• the NJ method does not assume the taxa to be equidistant from the root. It corrects for unequal evolutionary rates
between sequences by using a conversion step. K R MICRO NOTES 61

• CHARACTER-BASED METHODS
• Character-based methods (also called discrete methods) are based directly on the sequence characters rather than on pairwise
distances.
• Maximum Parsimony
• The parsimony method chooses a tree that has the fewest evolutionary changes or shortest overall branch lengths.
• Maximum parsimony attempts to reduce branch length by minimizing the number of evolutionary changes required between
sequences. The optimal tree would be the shortest tree with the fewest mutations. All potential trees are evaluated, and the tree
with the least amount of homoplasy(A homoplasy is a shared character between two or more animals that did not arise from a
common ancestor. A homoplasy is the opposite of a homology, where a common ancestor provided the genes that gave rise to
the trait in two or more animals.), or convergent evolution, is selected as the most likely tree. Since the most-parsimonious tree
is always the shortest tree, it may not necessarily best represent the evolutionary changes that have occurred. Also, maximum
parsimony is not statistically consistent, leading to issues when drawing conclusions.
• Maximum likelihood
• Another character-based approach is ML, which uses probabilistic models to choose a best tree that has the highest
probability or likelihood of reproducing the observed data. It finds a tree that most likely reflects the actual evolutionary
process. ML is an exhaustive method that searches every possible tree topology and considers every position in an
alignment, not just informative sites.
• Despite being slow and computationally expensive, maximum likelihood is the most commonly used phylogenetic method used
in research papers, and it is ideal for phylogeny construction from sequence data. For each nucleotide position in a sequence, the
maximum likelihood algorithm estimates the probability of that position being a particular nucleotide, based on whether the
ancestral sequences possessed that specific nucleotide.
• Maximum likelihood is based on the concept that each nucleotide site evolves independently, enabling phylogenetic
relationships to be analyzed at each site. The maximum likelihood method can be carried out in a reasonable amount of time for
four sequences. If more than four sequences are to be analyzed, then basic trees are constructed for the initial four sequences,
and further sequences are subsequently added, and maximum likelihood is recalculated.
K R MICRO NOTES 62

Evaluating phylogenies
• After phylogenetic tree construction, the next step is to statistically evaluate the reliability of the
inferred phylogeny. There are two questions that need to be addressed. One is how reliable the tree
or a portion of the tree is; and the second is whether this tree is significantly better than another
tree. To answer the first question, we need to use analytical resampling strategies such as
bootstrapping and jackknifing, which repeatedly resample data from the original dataset. For the
second question, conventional statistical tests are needed.
• Bootstrapping is a statistical technique that tests the sampling errors of a phylogenetic tree.
• In addition to bootstrapping, another often used resampling technique is jackknifing.
K R MICRO NOTES 63

Dichotomous Key
• Classification is very important to the field of biology. As we continue
to discover new species, learn better techniques for analyzing
relationships between species (i.e. DNA analysis) and share
information internationally it is important to have systems in place to
identify and classify organisms. A dichotomous key is a tool that
helps to identify an unknown organism. A dichotomous key is a
series statements consisting of 2 choices that describe
characteristics of the unidentified organism. The user has to make a
choice of which of the two statements best describes the unknown
organism, then based on that choice moves to the next set of
statements, ultimately ending in the identity of the unknown.
Dichotomous keys are often used in field guides to help users
accurately identify a plant or animal, but can be developed for
virtually any object. They are particularly helpful when two species
are very similar to one another.
K R MICRO NOTES 64

Introduction to prokaryotes Introduction to prokaryotes

More Related Content

Similar to Introduction to prokaryotes Introduction to prokaryotes (20)

More from KARTHIK REDDY C A (20)

Recently uploaded (20)

Introduction to prokaryotes Introduction to prokaryotes