1. Biol336-12 1
Molecular evolution
Part I: The evolution of macromolecules.
Part II: The reconstruction of the evolutionary
history of genes and organisms.
2. Biol336-12 2
Molecular Evolution
www.ncbi.nlm.nih.gov
AGAMOUS; transcription factor [ Arabidopsis thaliana ]
What information can DNA
sequences give us?
Evaluating the role of
drift/demography vs. selection on
trait divergence.
Identify function. Looking at genes
whose evolutionary history was
shared.
3. Biol336-12 3
Molecular Evolution
3D Protein Structure of Human
proinsulin
Munte et al. 2004 FEBS J
1952: Frederick Sanger
and coworkers determine
the complete amino acid
sequence of insulin.
MALWMRLLPLLALLALWGPD
PAAAFVNQHLCG
4. Biol336-12 4
How and why have molecular sequences
evolved to be the way they are?
Molecular Evolution
7. Biol336-12 7
Molecular Evolution
What happens after a mutation arises in
the DNA sequence at a locus?
Polymorphism: mutant allele is one of -
several present in population.
Substitution: the mutant allele fixes in the
population. (New mutations at other
nucleotides may occur later.)
9. Biol336-12 9
Molecular Evolution
Imagine that five
sequences are
obtained from
each of two
species, and that
the sequences are
related to each
other as shown
here.
10. Biol336-12 10
Molecular Evolution
Any mutation that
happens on a red branch
will appear as a
polymorphism within
species 1.
Any mutation that
happens on a blue
branch will appear as a
polymorphism within
species 2.
Any mutation that
happens on the green
branch will appear as a
fixed difference between
the species
within species
between species
11. Biol336-12 11
Molecular Evolution
What happens after a mutation arises in
the DNA sequence at a locus?
Polymorphism: mutant allele is one of -
several present in population.
Substitution: the mutant allele fixes in the
population. (New mutations at other
nucleotides may occur later.)
12. Biol336-12 12
Molecular Evolution
Substitution rate: the rate at which
mutant alleles rise to fix within a lineage
By comparing DNA sequences from
different organisms, we can estimate the
rate at which mutations appear and fix,
causing basepair substitutions.
13. Biol336-12 13
Molecular Evolution
How many selectively neutral mutants reach
fixation per unit time?
Neutral mutations occur at a rate, μ per locus per generation.
In a diploid population at a particular locus, there are 2N alleles.
The number of mutants arising every generation at a given
locus in a diploid population of size N is
The probability of fixation of selectively neutral allele?
Thus, the substitution rate for neutral alleles is
1/2N
2N*μ
(1/2N)( 2N*μ) = μ
14. Biol336-12 14
Molecular Evolution
What is the substitution rate for neutral alleles?
μ
What is the substitution rate for beneficial alleles
(s>0)?
What is the substitution rate for deleterious
alleles?
Fixation probability for a beneficial allele
(2Nμ)(2s) = 4Nμs
Close to zero.
15. Biol336-12 15
Molecular Evolution
Consider a numerical example:
A new mutant arises in a population of 1000
individuals.
If it is neutral the probability it will fix is
If it confers a selective advantage of s=0.01, then
the probability it will fix is,
If it has a selective disadvantage of s=-0.001?
1/2N=1/(2*1000)
2*s=0.02 (2%)
0.004%
16. Biol336-12 16
Molecular Evolution
)
4
(
1
2
)
( Ns
e
s
fixation
P
If the population size is very large then the
probability of fixation for an advantageous mutation
converges to 2*s
Given s=0.01, N=1000, P(fixation)= 0.02 or 2%,
Given s=0.01, N=100, P(fixation)=0.02037
17. Biol336-12 17
Molecular Evolution
)
4
(
1
2
)
( Ns
e
s
fixation
P
What about slightly deleterious mutations?
s= -0.001, N=1000 P(fixation)=0.000406
s=-0.001, N=100, P(fixation)= 0.0049
s=-0.001, N=10, P(fixation) = 0.0499
18. Biol336-12 18
Molecular Evolution
Are most substitutions (fixed changes) due to
drift or natural selection?
vs.
Agree that:
Most mutations are deleterious and are removed.Some
mutations are favourable and are fixed.
At Dispute:
Are most replacement mutations that fix beneficial or neutral?
Is observed polymorphism due to selection or drift?
19. Biol336-12 19
Molecular Evolution
Silent (or synonymous) mutations, where
the amino acid remains unchanged, are
more likely to be neutral.
Replacement (or non-synonymous)
mutations causing an amino acid change
are more likely to experience selection.
– Form and strength depends on gene and its function
20. Biol336-12 20
Molecular Evolution
Mammalian Genes Non-synonymous
substitution rate
(per site per 109
years)
Synonymous
substitution rate
(per site per 109
yrs)
Histone 4 0.00 4.52
Histone 3 0.00 3.94
Myosin 0.10 2.15
Insulin 0.20 3.03
Growth Hormone 1.34 3.79
Immunoglobulin k 2.03 5.56
21. Biol336-12 21
Molecular Evolution
Histones seem to have an unusually low
replacement substitution rate.
This suggests that mutations causing basepair
changes in histones are deleterious
WHY?
22. Biol336-12 22
Molecular Evolution
Histones are DNA binding proteins around which DNA is coiled
to form chromatin. Many positions within the protein interact
with the DNA or other histones.
23. Biol336-12 23
Molecular Evolution
Most amino acid changes
in histone proteins may
have negative or even
lethal consequences.
Histone proteins have
strong functional
constraints.
24. Biol336-12 24
Molecular Evolution
Mammalian Genes Non-synonymous
substitution rate
(per site per 109
years)
Synonymous
substitution rate
(per site per 109
yrs)
Histone 4 0.00 4.52
Histone 3 0.00 3.94
Myosin 0.10 2.15
Insulin 0.20 3.03
Growth Hormone 1.34 3.79
Immunoglobulin k 2.03 5.56
26. Biol336-12 26
Molecular Evolution
It could be that
selection favours
mutations in these
regions, thereby
increasing the
diversity among
antibodies produced
by the body and
improving the immune
response
28. Biol336-12 28
Molecular Evolution
To infer that selection has acted within a genome,
one must reject the null hypothesis that no
selection has acted.
Null hypothesis: describes pattern of sequence
evolution under the forces of mutation and drift.
Remember from neutral theory: The rate at which one
nucleotide is replaced by another nucleotide throughout a
population (substitution) equals the rate of mutation (μ) at that
site.
29. Biol336-12 29
Molecular Evolution
How do we detect selection at DNA
sequences?
Comparing intra-species polymorphism to inter-
species differences (McDonald-Kreitman test).
Linked/neighbouring neutral markers.
Examine genes for Dn/Ds ratios.
30. Biol336-12 30
Molecular Evolution:
The McDonald Kreitman Test
Kreitman and Hudson (1991) sequenced a 4750 basepair
region near the alcohol dehydrogenase (ADH) gene from
11 individuals of D. melanogaster and found higher than
expected levels of polymorphism
31. Biol336-12 31
There is only one amino acid polymorphism (AdhF
/AdhS
)
within this region which occurs at site 1490.
Molecular Evolution:
The McDonald Kreitman Test
32. Biol336-12 32
Selection may be maintaining this polymorphism at or near
this site.
Molecular Evolution:
The McDonald Kreitman Test
33. Biol336-12 33
ADH is an enzyme that breaks
down ethanol.
Flies carrying the ADHF
allele
survive better when
their food is spiked with ethanol
than do flies carrying the ADHS
allele (Cavener and Clegg 1981)
Nonetheless, the factor that
maintains ADHF
/ADHS
polymorphism remains unknown.
Molecular Evolution:
The McDonald Kreitman Test
Alchohol dehydrogenase
34. Biol336-12 34
How and why have molecular sequences evolved
to be the way they are?
How do we explain the patterns of variation observed
in ADH DNA sequences?
Molecular Evolution:
The McDonald-Kreitman Test
35. Biol336-12 35
Molecular Evolution:
McDonald Kreitman Test
Imagine that five
sequences are
obtained from
each of two
species, and that
the sequences are
related to each
other as shown
here.
36. Biol336-12 36
Any mutation that
happens on a red branch
will appear as a
polymorphism within
species 1.
Any mutation that
happens on a blue
branch will appear as a
polymorphism within
species 2.
Any mutation that
happens on the green
branch will appear as a
fixed difference between
the species
within species
between species
Molecular Evolution:
McDonald Kreitman Test
37. Biol336-12 37
Some abbreviations:
Within species
Ps=numbers of synonymous polymorphisms
Pn=numbers of non-synonymous polymorphisms
Between species
Ds=numbers of synonymous substitutions
Dn=numbers of non-synonymous substitutions
Molecular Evolution:
McDonald Kreitman Test
38. Biol336-12 38
If mutations occur
randomly over time
and if the chance that
a mutation does or
does not cause an
amino acid change
remains constant,
then the ratio of
replacement to
silent changes
should be the same
along any of these
branches
Between species
Molecular Evolution:
McDonald Kreitman Test
39. Biol336-12 39
If mutations are neutral
any of these mutations
has an equal chance of
persisting.
So the ratio of
replacement to silent
polymorphisms within a
species (Pn/Ps) should be
the same as the ratio of
replacement to silent
differences fixed between
species (Dn/Ds)
Dn/Ds
Pn/Ps
Molecular Evolution:
McDonald Kreitman Test
40. Biol336-12 40
Molecular Evolution
The McDonald-Kreitman Test:
Ho: If all changes are neutral, the ratio of
replacement to silent changes at polymorphic sites
(within species) should equal the ratio among fixed
differences (between species).
H1: If replacement mutations are advantageous,
they fix rapidly, causing a higher replacement to
silent ratio between species and a lower
replacement to silent ratio within species.
41. Biol336-12 41
Molecular Evolution
The McDonald-Kreitman Test:
H2: If replacement mutations are deleterious, they
rarely fix. Thus there will be a lower ratio of
replacement to silent changes between species and
a higher replacement to silent ratio within species.
H3: If replacement mutations are subject to
heterozygote advantage or frequency dependent
selection, they rarely fix, causing a lower
replacement to silent ratio between species and a
higher replacement to silent ratio within species.
43. Biol336-12 43
ADH gene Fixed differences
Between species
Polymorphisms
Within species
Replacement 7 2
Silent 17 42
Btwn species: Ratio of replacement to silent = 7/17 =0.41
Wn species: Ratio of replacement to silent = 2/42 =0.05
FIXED>POLYMORPHISM
Molecular Evolution:
McDonald Kreitman Test
44. Biol336-12 44
Using a X2
test, the null hypothesis that
selection is absent is statistically rejected for
ADH.
The excess of replacement differences
between species suggests that mutations
have been postively favoured.
Molecular Evolution:
McDonald Kreitman Test
45. Biol336-12 45
Assumes:
All synonymous mutations are neutral (codon bias).
All non-synonymous mutations are either strongly deleterious, neutral or
strongly advantageous.
Levels of polymorphism are governed by the neutral mutation rate.
Within a species, advantageous mutations contribute little to
polymorphism but can contribute to divergence between species.
A problem with this test is that:
Molecular Evolution:
McDonald Kreitman Test
A failure to reject the null hypothesis could be
because both purifying and directional selection have
taken place.
Not all synonymous changes are in fact neutral. In
some organisms, some codons are preferentially
used.
47. Biol336-12 47
Molecular Evolution:
Neighbouring marker sites
If a beneficial mutation appears and sweeps
through a population, what will happen to the
level of polymorphism present at neighbouring
DNA sites?
48. Biol336-12 48
If a beneficial mutation appears and sweeps through
a population, what will happen to the level of
polymorphism present at neighbouring DNA sites?
Molecular Evolution:
Neighbouring marker sites
Genetic hitchhiking will decrease variation.
49. Biol336-12 49
In the case of Plasmodium falciparum, diversity
at neighbouring marker loci decreased.
Molecular Evolution:
Neighbouring marker sites
51. Biol336-12 51
If there is overdominance at a nucleotide site,
what will happen to the level of polymorphism at
neighbouring sites?
Molecular Evolution:
Neighbouring marker sites
Variation at linked sites is more likely to
be maintained.
52. Biol336-12 52
If there is directional selection to remove a
particular mutant allele (purifying selection),
what will happen to the marker allele that
happens to be on the same chromosome?
Molecular Evolution:
Neighbouring marker sites
It will decrease in frequency as a result of this
association. This is called background selection.
53. Biol336-12 53
Molecular Evolution
So what is the evidence for natural selection shaping
DNA sequences?
1
Ds
Dn
Nielsen et al.(2005) PloS Biology
1
Ds
Dn
H0: neutral
H1: positive
55. Biol336-12 55
Molecular Evolution
How can you detect the signature of selection?
Comparing intra-species polymorphism to inter-
species differences (McDonald-Kreitman test).
Linked/neighbouring neutral markers.
Examine genes for Dn/Ds ratios.
56. Biol336-12 56
Molecular Evolution
Zayed and Whitfield (2008) PNAS
If drift and demography are important then the effects will be seen
on the whole genome.
If selection is important, then the effects will be seen in specific
regions of the genome.
Editor's Notes
#1:Molecular evolution encompasses two areas of study:
The evolution of macromolecules: the rates and patterns of change in the genetic material (DNA sequences) and in the encoded products proteins
The evolutionary history of genes and organisms
#3:This field has its roots in two separate disciplines: population genetics and molecular biology. Population genetics provides the theoretical background and molecular biology provides the empirical data.
The first complete sequence of a protein (insulin) was determined in 1952 by F.Sanger and colleagues.
#6:Here are some sequences taken from Arabidopsis thaliana – two individuals 145 and 134. These are the MADS BOX genes important for flower development. On the left is the identifier =name of gene, species, individual
Bolded in black is the reference individual. We will compare this sequence to the others.
Btwn AgAt145 and AgAt134 we have two changes in the third position. Both are synonymous change because the amino acid stay the same. The first is a transitions because G->A (a change from one purine to another) but the second case G->C (from purine to pyrimidine to another) is an example of a transversion.
Comparing AgAt145 to Sep1134 and Sep1145 A->G (a transition) and T->G (a transversion) but this time lysine changes to arginine in the first and in the second case leucine changes to valine both are nonsynonymous changes because they result in the amino acid changing.
Comparing AgAt145 to CalAt145 and CalAt134: we see there is the deletion of a codon creating a gap.
But there is also a change from G->T (transversion) resulting in a nonsynonymous change = valine changes to phenylalanine.
#7:Mutations occur-sometimes they are the result of DNA replication errors or errors in DNA repair leading to the changes in nucleotides, however, sometimes the changes are larger creating deletions and insertions.
When we ask the question HOW and WHY have molecular sequences evolved the way they do? We are really interested in knowing once a mutation arises what happens to it? Does it remain in the population as one of several mutant alleles? In which case we consider this a polymorphism.
If on the other hand it fixes in a population, then the change is considered a subsitution.
#8:At the start, time 0 generation, everyone in the population is aaat. As time passes a mutation arises in individual 5. Now there are two different sequences segregating in the population = polymorphism. This polymorphism is present in several individuals within the population and hangs around for generations 10 to about 29. Finally in generation 30 it is fixed – every individual in the population has a “c” at this second nt. This is the same as the frequency of p reaching 1.
Again in generation 40, a new mutation arises and we have a polymorphism again.
#9:Any mutation that happens on a red branch will appear as a polymorphism within species one
#10:Here the phylogeny is divided into two parts: between species branches and within species branches.
Within species branches connect all the alleles within each species to their most recent common ancestor.
Between species branches connect these common ancestors to the common ancestor of the whole phylogeny.
A mutation on a between species branch will appear in all the descendant alleles and thus will be a fixed difference between species. A mutation on a within species branch will be a polymorphism within a species
#13:A new mutant arising as a single copy in a diploid population of size N has an initial frequency of 1/2N. If only drift is acting what is the probability of fixation for that neutral allele? 1/2N
#14:Probability of fixation of an advantageous allele * the number of new mutants arising every generation.
Probability of fixation for positive values of s when N is large, is 2s.
IF the absolute value of s is small the probability of fixation is 2s/1-exp(-4Ns)
#15:These last two results are noteworthy because it means advantageous mutations don’t always fix in a population. In the case of an advantageous mutation with s=0.01, the probability it will fix is 2% but that also means 98% of all the mutations with the selective advantage of 0.01 are lost.
On the other hand, even slightly deleterious mutations have a finite (albeit small) chance of fixing in a population.
#18:The 1960s witnessed a revolution in population genetics. With the introduction of electrophoresis into popgen studies, soon led to the discovery of large amounts of genetic variability in natural populations such as humans and Drosophila.
In 1968 Kimura postulated that the majority of the molecular changes in evolution were due to the random fixation of neutral or nearly neutral mutations. This created a dispute between neutralists and selectionist. The dispute essentially concerns the distribution of fitness values of mutant alleles.
#20:From this table, it is clear that the rate of nonsynonymous substitution is variable among different genes, ranging from zero to about 2x10-9 substitutions per nonsynonymous site per year
Histones have an unusually low replacement substitution rate.
Look at the column describing the rate of synonymous substitution. It also varies though not as much as the rate of nonsynonymous substittuion
#22:Looking at H3 and H4 it is clear there is some interaction with both the DNA and other histones
#24:From this table, it is clear that the rate of nonsynonymous substitution is variable among different genes, ranging from zero to about 2x10-9 substitutions per nonsynonymous site per year
Histones have an unusually low replacement substitution rate.
Look at the column describing the rate of synonymous substitution. It also varies though not as much as the rate of nonsynonymous substittuion
#25:Immunoglobin genes are proteins found in the blood or bodily fluids of vertebrates and are used by the immune system to identify and neutral foreign objects. It is the small region at the tip of the protein that is extremely variable. Each variant can bind a different target or antigen. A huge diversity in this region allows the immune system to recognize an equally wide diversity of antigens
#26:Immunoglobin genes are proteins found in the blood or bodily fluids of vertebrates and are used by the immune system to identify and neutral foreign objects. It is the small region at the tip of the protein that is extremely variable. Each variant can bind a different target or antigen. A huge diversity in this region allows the immune system to recognize an equally wide diversity of antigens
#28:Probability of fixation of an advantageous allele * the number of new mutants arising every generation.
Probability of fixation for positive values of s when N is large, is 2s.
IF the absolue value of s is small the probability of fixation is 2s/1-exp(-4Ns)
#35:Any mutation that happens on a red branch will appear as a polymorphism within species one
#36:Here the phylogeny is divided into two parts: between species branches and within species branches.
Within species branches connect all the alleles within each species to their most recent common ancestor.
Between species branches connect these common ancestors to the common ancestor of the whole phylogeny.
A mutation on a between species branch will appear in all the descendant alleles and thus will be a fixed difference between species. A mutation on a within species branch will be a polymorphism within a species
#38:Remember we’ve divided nt subsitutions in a coding region into two types: replacement (non-synonymous) and synonymous.
For a particular phylogeny and mutation rate, if mutations occur randomly over time and if the chance that a mutation does or doesn’t cause a change in the amino acid remains constant, then ratio of the replacement changes to silent changes should be the same along any of these branches.
#39:Remember we’ve divided nt subsitutions in a coding region into two types: replacement (non-synonymous) and synonymous.
For a particular phylogeny and mutation rate, if mutations occur randomly over time and if the chance that a mutation does or doesn’t cause a change in the amino acid remains constant, then ratio of the replacement changes to silent changes should be the same along any of these branches.