SlideShare a Scribd company logo
GENE315: Measuring genome function
Paul Gardner
March 20, 2023
Objectives for lecture 07
▶ An understanding of:
▶ How “function” in a genomic context can be defined. This is a
surprisingly philosophical problem.
▶ Some strategies for determining if a genomic region is likely to
be functional or not.
OK Go (2010) This Too Shall Pass.
Rube Goldberg machine
What are the “functional” bits in a genome?
HG38 chr11::62,851,834-62,855,980.
A famous region...
What is “functional”?
▶ ENCODE: “biochemically functional” i.e. anything with a
reproducible biochemical activity is functional.
▶ Biological philosophy:
▶ “That A implies B does not entail that B must imply A.”
▶ I.e. Protein & ncRNA genes are transcribed, does not imply
that all transcribed elements are genes (or functional).
▶ “Is the mere fact that X causes Y enough to say that Y is X’s
proper function?”
▶ Doolittle & Brunet describe three posibilities based upon
Darwinian evolution:
▶ Selected effect: a trait that was under positive selection in
previous generations (now maintained by negative selection).
▶ Constructive neutral evolution: negative selection of traits
that were never under positive selection.
▶ Exaptation: traits shaped by natural selection for some use
other than their current one.
▶ I.e. evidence of evolutionary selection is required before
claiming a function.
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is your theoretical model of the cell?
Image sources: Wikimedia Commons & Prof. David S. Goodsell
What is “functional”?
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is “functional”? – the Gardner simplification
EVIDENCE FOR MAINTENANCE BY
NEGATIVE SELECTION, OR
CURRENTLY UNDER POSITIVE
SELECTION
Based on Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC
Biology.
Some useful definitions
▶ Negative or purifying selection: the selective removal of alleles
that are deleterious to fitness.
▶ Positive selection: Positive selection is the process by which
alleles that increase organismal fitness also increase in
frequency in a population.
Booker et al. (2017) Detecting positive selection in the genome. BMC Biology.
▶ Legend has it that the charismatic inventor of the
“Kahungunu Wave” left many descendants (Ngā Tukemata o
Kahungunu)
Some useful definitions
▶ Constructive neutral evolution: how complex biological
systems might arrive via a series of neutral events.
▶ Almost synonymous with “Negative Selection”, as used by
Doolittle.
▶ Exaptation: a trait (or gene) evolved to serve one particular
function, but subsequently used to serve another.
Lukes̆ et al. (2011) How a Neutral Evolutionary Ratchet Can Build Cellular Complexity. IUBMB Life.
A great example of Exaptation
▶ Xist (X-inactive specific transcript) is a long non-coding RNA
required for X-inactivation in placental mammals
▶ Xist derived/exapted from a protein-coding pseudogene
(Lnx3)
▶ Mutually exclusive phylogenetic pattern with Lnx3, sequence
& synteny preserved and out-of-frame mutations in the
mammals...
Duret et al (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene.
Science.
Doolittle & Brunet quotes
▶ On accumulating transposable elements: “There is no
selective advantage to individuals within a species for doing
this, and evolution has no foresight [15]!”
▶ “A function concept embedded in one definitional framework
(SE) was (supposedly) refuted by empirical data based on
quite another (CR).”
▶ “it is the irrelevance of the majority of TEs (at least half of
our own DNA) to fitness at the organismal level that means
that “junk” is likely always to be a reasonable way to refer to
it”
▶ Final sentence: “If lungfishes have junk, or at the least
extraordinarily weakly or diffusely functional DNA in their
genomes, why is it that we think we do not?”
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
What is “functional”? – a more holistic point of view.
▶ Argue for using combinations of biochemical, evolutionary,
and genetic evidence to elucidate genome function in human
biology and disease.1
Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS.
1. there are many different levels of evolutionary evidence!
Our attempt...
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
1kGP Average MAF
gnomAD Average MAF
gnomAD SNP Count
Covariance Min E−value
Secondary Structure MFE
Interaction Energy Avg
Interaction Energy Min
gnomAD SNP Density
Accessibility
Repeat Distance Sum
1kGP SNP Density
Neutral Predictor
Repeat Distance Min
Genomic copies (E−val<0.01)
RNAalifold Score
Fickett score
GERP Score Avg
GC%
Primary Cell RPKM
Covariance Max
RNAcode Coding Potential
Tissue RPKM
Primary Cell MRD
PhastCons Avg
GERP Score Max
Tissue MRD
PhyloP Avg
PhyloP Max
PhastCons Max
0 50 100 150
Mean Decrease in Protein−coding
Gini Coefficient over 100 runs
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
1kGP Average MAF
Fickett score
gnomAD Average MAF
gnomAD SNP Count
RNAcode Coding Potential
gnomAD SNP Density
Neutral Predictor
RNAalifold Score
Genomic copies (E−val<0.01)
Covariance Max
1kGP SNP Density
Covariance Min E−value
GC%
Repeat Distance Min
Repeat Distance Sum
Interaction Energy Min
PhyloP Max
GERP Score Max
GERP Score Avg
Secondary Structure MFE
Accessibility
Interaction Energy Avg
PhastCons Max
PhyloP Avg
PhastCons Avg
Tissue MRD
Tissue RPKM
Primary Cell RPKM
Primary Cell MRD
0 50 100
Mean Decrease in Short ncRNA
Gini Coefficient over 100 runs
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
1kGP SNP Count
Genomic copies (E−val<0.01)
gnomAD SNP Count
PhastCons Max
RNAalifold Score
Secondary Structure MFE
Accessibility
Interaction Energy Avg
1kGP SNP Density
gnomAD Average MAF
PhyloP Avg
1kGP Average MAF
Interaction Energy Min
PhastCons Avg
Covariance Min E−value
Repeat Distance Min
Neutral Predictor
RNAcode Coding Potential
Repeat Distance Sum
gnomAD SNP Density
Fickett score
PhyloP Max
GERP Score Avg
GERP Score Max
Covariance Max
Primary Cell RPKM
GC%
Tissue RPKM
Tissue MRD
Primary Cell MRD
20 40 60
Mean Decrease in lncRNA
Gini Coefficient over 100 runs
Feature type
●
●
●
●
●
●
Intrinsic sequence
Sequence conservation
Transcriptome expression
Genomic repeat association
Protein/RNA specific features
Population variation
Proteins Short ncRNAs Long ncRNAs
Cooper & Gardner (2021) Features of Functional Human Genes. bioRxiv.
Random sequence experiments identify lots of transcription
(& translation)
Recall Sean Eddy’s “Random Chromosome Project”?
▶ Neme et al (2017) “Random sequences are an abundant
source of bioactive RNAs or peptides”. Nature Ecology &
Evolution.
▶ ... “by expressing clones with random sequences in E. coli” ...
“Contrary to expectations, we find that random sequences
with bioactivity are not rare.”
▶ de Boer et al (2019) “Deciphering eukaryotic gene-regulatory
logic with 100 million random promoters”. Nature
Biotechnology.
▶ ... “we measure the expression output of > 100 million
synthetic yeast promoter sequences that are fully random” ...
“ it is often tacitly assumed that functional TFBSs are rare” ...
“TFBSs are common in random DNA” ... “Abundant weak
regulatory interactions explain most of expression level”
NB. expression from random sequence is also reproducible!
Based upon the Doolittle & Brunet classification...
▶ Are the following elements likely to be a “Selected
Effect”, “Constructive Neutral Evolution”, “Exapted” or
no beneficial effect?
Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
Complete the following table in small groups...
Ultra-conserved elements (UCRs)
▶ Highly conserved sequence (in mammals)
Bejerano et al. (2004) Ultraconserved Elements in the Human Genome. Science.
Snetkova et al. (2022) Perfect and imperfect views of ultraconserved sequences. Nature Reviews Genetics.
UCNEbase: ultra-conserved non-coding elements database
Human accelerated regions (HARs)
Positive selection since humans diverged from the human and
chimp ancestor.
▶ Whalen & Pollard (2022) Enhancer Function and Evolutionary
Roles of Human Accelerated Regions. Annual Review of
Genetics.
Example: HAR1
▶ Compare genome sequences, find otherwise conserved regions
that are evolving rapidly in humans HAR1 in particular,
appears to be involved in cortical development
▶ Pollard et al. (2006) Forces Shaping the Fastest Evolving
Regions in the Human Genome. PLOS Genetics.
▶ Pollard et al. (2006) An RNA gene expressed during cortical
development evolved rapidly in humans. Nature.
Scale
chr20:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
100 bases hg38
63,102,050 63,102,100 63,102,150 63,102,200 63,102,250 63,102,300 63,102,350 63,102,400 63,102,450
Your Sequence from Blat Search
GENCODE V39 (4 items filtered out)
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
HAR1B
HAR1B
HAR1B
ENSG00000274915
HAR1A
HAR1B
HAR1B
HAR1B
ENSG00000274915
HAR1A
Wikipedia article: Human accelerated regions
HAR1 in the UCSC genome browser.
7SL RNA
▶ Also called SRP RNA, part of the signal recognition particle,
involved in protein transport
▶ Universally conserved, orthologues are found in archaea,
bacteria and eukaryotes.
Scale
chr14:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
200 bases hg38
49,586,450 49,586,500 49,586,550 49,586,600 49,586,650 49,586,700 49,586,750 49,586,800 49,586,850 49,586,900 49,586,950 49,587,000 49,587,050
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
Vertebrate Multiz Alignment & Conservation (100 Species)
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
RPS29
RPS29
RN7SL1
Cons 100 Verts
Wikipedia article: Signal recognition particle
7SL in the UCSC genome browser.
Alu elements (I)
▶ Transposable element, derived from 7SL RNA. Over one
million copies in the human genome
▶ Breaks a LOT of bioinformatics tools, frequently masked from
genome sequences
Wikipedia article: Alu element
Dfam entry for the AluY family
Alu elements (II)
▶ Is the Alu element on human chromosome 14, between
31,474,114 and 31,473,798 likely to be functional?
Scale
chr14:
GWAS Catalog
Chimp
Gorilla
Orangutan
Gibbon
Rhesus
Crab-eating_macaque
Baboon
Green_monkey
Marmoset
Bushbaby
Chinese_tree_shrew
Squirrel
Lesser_Egyptian_jerboa
Prairie_vole
Chinese_hamster
Mouse
Rat
Naked_mole-rat
Guinea_pig
Chinchilla
Brush-tailed_rat
Rabbit
Pika
Pig
Alpaca
Bactrian_camel
Killer_whale
Tibetan_antelope
Cow
Sheep
Domestic_goat
White_rhinoceros
Cat
Dog
Ferret_
Panda
Pacific_walrus
Weddell_seal
Black_flying-fox
Megabat
David’s_myotis_(bat)
Little_brown_bat
Big_brown_bat
Hedgehog
Shrew
Elephant
Cape_elephant_shrew
Manatee
Cape_golden_mole
Tenrec
Armadillo
Opossum
Wallaby
Chicken
Common dbSNP(153)
SINE
LINE
LTR
DNA
Simple
Low Complexity
Satellite
RNA
Other
Unknown
100 bases hg38
31,473,750 31,473,800 31,473,850 31,473,900 31,473,950 31,474,000 31,474,050 31,474,100 31,474,150
Your Sequence from Blat Search
GENCODE V39
Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105)
Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105)
NHGRI-EBI Catalog of Published Genome-Wide Association Studies
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Short Genetic Variants from dbSNP release 153
Repeating Elements by RepeatMasker
An Alu element in the UCSC genome browser.
Mills et al. (2007) Which transposable elements are active in the human genome?
The main points
▶ “Function” measured in a variety of ways:
▶ Phenotypic: “biochemical” activity
▶ Evolutionary: evidence of positive/negative selection
▶ Genetic: knockout or mutation and compensation/complement
▶ A Darwinian definition of “function”: phenotype/activity
AND evolutionary evidence (SE, CNE or Exaptation)
▶ Evolutionary evidence can be varied: usually based on
sequence conservation or reduced SNPs
▶ Some suggest overlapping evidence of biochemical activity &
evolutionary conservation (i.e. regions under negative
selection).
Self-evaluation exercises
▶ Compare and contrast the ENCODE definition of function to
the Darwinian definition proposed by Doolittle & Brunet
(2017).
▶ Outline a method for determining if a predicted mouse gene is
functional. What are potential pitfalls for the approach?
▶ Where do human-accelarated regions, ultraconserved regions,
Xist, BC1 RNA and Junk DNA lie in Doolittle & Brunet
(2017) classification of functional DNA?
▶ Justify your answers!
Further reading
▶ Doolittle & Brunet (2017) On causal roles and selected
effects: our genome is mostly junk. BMC Biology.
▶ Kellis et al. (2014) Defining functional DNA elements in the
human genome. PNAS.
▶ Neme et al (2017) Random sequences are an abundant source
of bioactive RNAs or peptides. Nature Ecology & Evolution.
Post questions on the FAQ GoogleDoc:
https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv-
qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
Genetics
Mātai Ira
otago.ac.nz/genetics
Genetics
Mātai Ira
Social Mixer
WHEN
Wednesday 29th March 6.00pm to 8pm
WHERE
BIG13, ground floor Biochemistry
Genetics
Mātai Ira
otago.ac.nz/genetics
PIZZA&SOFTDRINKS
PROVIDED!
Near Rakiriri, Otago Harbour, 7 May 2020.

More Related Content

PDF
ppgardner-lecture03-genomesize-complexity.pdf
PPT
Human genome project(ibri)
PPTX
Dan Graur - Can the human genome be 100% functional?
PDF
Comparative Genomics and Visualisation BS32010
PDF
Stephen Friend Food & Drug Administration 2011-07-18
PPTX
Is microbial ecology driven by roaming genes?
PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
ppgardner-lecture03-genomesize-complexity.pdf
Human genome project(ibri)
Dan Graur - Can the human genome be 100% functional?
Comparative Genomics and Visualisation BS32010
Stephen Friend Food & Drug Administration 2011-07-18
Is microbial ecology driven by roaming genes?
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell

Similar to ppgardner-lecture07-genome-function.pdf (20)

PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
PDF
Test Bank for Concepts of Genetics 11th Edition by Klug
PDF
Pathogen Genome Data
PPT
31961.ppt
PDF
Test Bank for Essentials of Genetics, 8th Edition by Klug
PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
PDF
Test Bank for Essentials of Genetics, 8th Edition by Klug
PPTX
INTRODUCTION OF Genes AND GENOMICS .pptx
PDF
Test Bank for Essentials of Genetics, 8th Edition by Klug
PPT
2015 03 13_puurs_v_public
PDF
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
PDF
Test Bank for Essentials of Genetics, 8th Edition by Klug
DOCX
In-class introduction to basic Punnett square set-up and problem s.docx
PDF
BolingerJustin - Honors Thesis
PDF
Solution Manual for Human Heredity Principles and Issues, 11th Edition
PPT
Genetics research for society and global understanding - Myles Axton
PDF
Test Bank for Essentials of Genetics, 8th Edition by Klug
DOCX
Biotechnology- Principles and processes investigatory project.
PPTX
Functional genomics, a conceptual approach
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
Test Bank for Concepts of Genetics 11th Edition by Klug
Pathogen Genome Data
31961.ppt
Test Bank for Essentials of Genetics, 8th Edition by Klug
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
Test Bank for Essentials of Genetics, 8th Edition by Klug
INTRODUCTION OF Genes AND GENOMICS .pptx
Test Bank for Essentials of Genetics, 8th Edition by Klug
2015 03 13_puurs_v_public
Test Bank for iGenetics: A Molecular Approach, 3rd Edition: Peter J. Russell
Test Bank for Essentials of Genetics, 8th Edition by Klug
In-class introduction to basic Punnett square set-up and problem s.docx
BolingerJustin - Honors Thesis
Solution Manual for Human Heredity Principles and Issues, 11th Edition
Genetics research for society and global understanding - Myles Axton
Test Bank for Essentials of Genetics, 8th Edition by Klug
Biotechnology- Principles and processes investigatory project.
Functional genomics, a conceptual approach
Ad

More from Paul Gardner (20)

PDF
ppgardner-lecture06-homologysearch.pdf
PDF
ppgardner-lecture05-alignment-comparativegenomics.pdf
PDF
ppgardner-lecture04-annotation-comparativegenomics.pdf
PDF
Does RNA avoidance dictate protein expression level?
PDF
Machine learning methods
PDF
Clustering
PDF
Monte Carlo methods
PDF
The jackknife and bootstrap
PDF
Contingency tables
PDF
Regression (II)
PDF
Regression (I)
PDF
Analysis of covariation and correlation
PDF
Analysis of two samples
PDF
Analysis of single samples
PDF
Centrality and spread
PDF
Fundamentals of statistical analysis
PDF
Random RNA interactions control protein expression in prokaryotes
PDF
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
PDF
A meta-analysis of computational biology benchmarks reveals predictors of pro...
PDF
01 nc rna-intro
ppgardner-lecture06-homologysearch.pdf
ppgardner-lecture05-alignment-comparativegenomics.pdf
ppgardner-lecture04-annotation-comparativegenomics.pdf
Does RNA avoidance dictate protein expression level?
Machine learning methods
Clustering
Monte Carlo methods
The jackknife and bootstrap
Contingency tables
Regression (II)
Regression (I)
Analysis of covariation and correlation
Analysis of two samples
Analysis of single samples
Centrality and spread
Fundamentals of statistical analysis
Random RNA interactions control protein expression in prokaryotes
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
01 nc rna-intro
Ad

Recently uploaded (20)

PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
An interstellar mission to test astrophysical black holes
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
The scientific heritage No 166 (166) (2025)
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
neck nodes and dissection types and lymph nodes levels
2. Earth - The Living Planet Module 2ELS
Introduction to Fisheries Biotechnology_Lesson 1.pptx
ECG_Course_Presentation د.محمد صقران ppt
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
POSITIONING IN OPERATION THEATRE ROOM.ppt
An interstellar mission to test astrophysical black holes
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
7. General Toxicologyfor clinical phrmacy.pptx
The KM-GBF monitoring framework – status & key messages.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
HPLC-PPT.docx high performance liquid chromatography
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
The scientific heritage No 166 (166) (2025)
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
neck nodes and dissection types and lymph nodes levels

ppgardner-lecture07-genome-function.pdf

  • 1. GENE315: Measuring genome function Paul Gardner March 20, 2023
  • 2. Objectives for lecture 07 ▶ An understanding of: ▶ How “function” in a genomic context can be defined. This is a surprisingly philosophical problem. ▶ Some strategies for determining if a genomic region is likely to be functional or not. OK Go (2010) This Too Shall Pass. Rube Goldberg machine
  • 3. What are the “functional” bits in a genome? HG38 chr11::62,851,834-62,855,980. A famous region...
  • 4. What is “functional”? ▶ ENCODE: “biochemically functional” i.e. anything with a reproducible biochemical activity is functional. ▶ Biological philosophy: ▶ “That A implies B does not entail that B must imply A.” ▶ I.e. Protein & ncRNA genes are transcribed, does not imply that all transcribed elements are genes (or functional). ▶ “Is the mere fact that X causes Y enough to say that Y is X’s proper function?” ▶ Doolittle & Brunet describe three posibilities based upon Darwinian evolution: ▶ Selected effect: a trait that was under positive selection in previous generations (now maintained by negative selection). ▶ Constructive neutral evolution: negative selection of traits that were never under positive selection. ▶ Exaptation: traits shaped by natural selection for some use other than their current one. ▶ I.e. evidence of evolutionary selection is required before claiming a function. Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 5. What is your theoretical model of the cell? Image sources: Wikimedia Commons & Prof. David S. Goodsell
  • 6. What is “functional”? Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 7. What is “functional”? – the Gardner simplification EVIDENCE FOR MAINTENANCE BY NEGATIVE SELECTION, OR CURRENTLY UNDER POSITIVE SELECTION Based on Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 8. Some useful definitions ▶ Negative or purifying selection: the selective removal of alleles that are deleterious to fitness. ▶ Positive selection: Positive selection is the process by which alleles that increase organismal fitness also increase in frequency in a population. Booker et al. (2017) Detecting positive selection in the genome. BMC Biology.
  • 9. ▶ Legend has it that the charismatic inventor of the “Kahungunu Wave” left many descendants (Ngā Tukemata o Kahungunu)
  • 10. Some useful definitions ▶ Constructive neutral evolution: how complex biological systems might arrive via a series of neutral events. ▶ Almost synonymous with “Negative Selection”, as used by Doolittle. ▶ Exaptation: a trait (or gene) evolved to serve one particular function, but subsequently used to serve another. Lukes̆ et al. (2011) How a Neutral Evolutionary Ratchet Can Build Cellular Complexity. IUBMB Life.
  • 11. A great example of Exaptation ▶ Xist (X-inactive specific transcript) is a long non-coding RNA required for X-inactivation in placental mammals ▶ Xist derived/exapted from a protein-coding pseudogene (Lnx3) ▶ Mutually exclusive phylogenetic pattern with Lnx3, sequence & synteny preserved and out-of-frame mutations in the mammals... Duret et al (2006) The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science.
  • 12. Doolittle & Brunet quotes ▶ On accumulating transposable elements: “There is no selective advantage to individuals within a species for doing this, and evolution has no foresight [15]!” ▶ “A function concept embedded in one definitional framework (SE) was (supposedly) refuted by empirical data based on quite another (CR).” ▶ “it is the irrelevance of the majority of TEs (at least half of our own DNA) to fitness at the organismal level that means that “junk” is likely always to be a reasonable way to refer to it” ▶ Final sentence: “If lungfishes have junk, or at the least extraordinarily weakly or diffusely functional DNA in their genomes, why is it that we think we do not?” Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 13. What is “functional”? – a more holistic point of view. ▶ Argue for using combinations of biochemical, evolutionary, and genetic evidence to elucidate genome function in human biology and disease.1 Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS. 1. there are many different levels of evolutionary evidence!
  • 14. Our attempt... ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count 1kGP Average MAF gnomAD Average MAF gnomAD SNP Count Covariance Min E−value Secondary Structure MFE Interaction Energy Avg Interaction Energy Min gnomAD SNP Density Accessibility Repeat Distance Sum 1kGP SNP Density Neutral Predictor Repeat Distance Min Genomic copies (E−val<0.01) RNAalifold Score Fickett score GERP Score Avg GC% Primary Cell RPKM Covariance Max RNAcode Coding Potential Tissue RPKM Primary Cell MRD PhastCons Avg GERP Score Max Tissue MRD PhyloP Avg PhyloP Max PhastCons Max 0 50 100 150 Mean Decrease in Protein−coding Gini Coefficient over 100 runs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count 1kGP Average MAF Fickett score gnomAD Average MAF gnomAD SNP Count RNAcode Coding Potential gnomAD SNP Density Neutral Predictor RNAalifold Score Genomic copies (E−val<0.01) Covariance Max 1kGP SNP Density Covariance Min E−value GC% Repeat Distance Min Repeat Distance Sum Interaction Energy Min PhyloP Max GERP Score Max GERP Score Avg Secondary Structure MFE Accessibility Interaction Energy Avg PhastCons Max PhyloP Avg PhastCons Avg Tissue MRD Tissue RPKM Primary Cell RPKM Primary Cell MRD 0 50 100 Mean Decrease in Short ncRNA Gini Coefficient over 100 runs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1kGP SNP Count Genomic copies (E−val<0.01) gnomAD SNP Count PhastCons Max RNAalifold Score Secondary Structure MFE Accessibility Interaction Energy Avg 1kGP SNP Density gnomAD Average MAF PhyloP Avg 1kGP Average MAF Interaction Energy Min PhastCons Avg Covariance Min E−value Repeat Distance Min Neutral Predictor RNAcode Coding Potential Repeat Distance Sum gnomAD SNP Density Fickett score PhyloP Max GERP Score Avg GERP Score Max Covariance Max Primary Cell RPKM GC% Tissue RPKM Tissue MRD Primary Cell MRD 20 40 60 Mean Decrease in lncRNA Gini Coefficient over 100 runs Feature type ● ● ● ● ● ● Intrinsic sequence Sequence conservation Transcriptome expression Genomic repeat association Protein/RNA specific features Population variation Proteins Short ncRNAs Long ncRNAs Cooper & Gardner (2021) Features of Functional Human Genes. bioRxiv.
  • 15. Random sequence experiments identify lots of transcription (& translation) Recall Sean Eddy’s “Random Chromosome Project”? ▶ Neme et al (2017) “Random sequences are an abundant source of bioactive RNAs or peptides”. Nature Ecology & Evolution. ▶ ... “by expressing clones with random sequences in E. coli” ... “Contrary to expectations, we find that random sequences with bioactivity are not rare.” ▶ de Boer et al (2019) “Deciphering eukaryotic gene-regulatory logic with 100 million random promoters”. Nature Biotechnology. ▶ ... “we measure the expression output of > 100 million synthetic yeast promoter sequences that are fully random” ... “ it is often tacitly assumed that functional TFBSs are rare” ... “TFBSs are common in random DNA” ... “Abundant weak regulatory interactions explain most of expression level” NB. expression from random sequence is also reproducible!
  • 16. Based upon the Doolittle & Brunet classification... ▶ Are the following elements likely to be a “Selected Effect”, “Constructive Neutral Evolution”, “Exapted” or no beneficial effect? Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology.
  • 17. Complete the following table in small groups...
  • 18. Ultra-conserved elements (UCRs) ▶ Highly conserved sequence (in mammals) Bejerano et al. (2004) Ultraconserved Elements in the Human Genome. Science. Snetkova et al. (2022) Perfect and imperfect views of ultraconserved sequences. Nature Reviews Genetics. UCNEbase: ultra-conserved non-coding elements database
  • 19. Human accelerated regions (HARs) Positive selection since humans diverged from the human and chimp ancestor. ▶ Whalen & Pollard (2022) Enhancer Function and Evolutionary Roles of Human Accelerated Regions. Annual Review of Genetics.
  • 20. Example: HAR1 ▶ Compare genome sequences, find otherwise conserved regions that are evolving rapidly in humans HAR1 in particular, appears to be involved in cortical development ▶ Pollard et al. (2006) Forces Shaping the Fastest Evolving Regions in the Human Genome. PLOS Genetics. ▶ Pollard et al. (2006) An RNA gene expressed during cortical development evolved rapidly in humans. Nature. Scale chr20: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 100 bases hg38 63,102,050 63,102,100 63,102,150 63,102,200 63,102,250 63,102,300 63,102,350 63,102,400 63,102,450 Your Sequence from Blat Search GENCODE V39 (4 items filtered out) Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker HAR1B HAR1B HAR1B ENSG00000274915 HAR1A HAR1B HAR1B HAR1B ENSG00000274915 HAR1A Wikipedia article: Human accelerated regions HAR1 in the UCSC genome browser.
  • 21. 7SL RNA ▶ Also called SRP RNA, part of the signal recognition particle, involved in protein transport ▶ Universally conserved, orthologues are found in archaea, bacteria and eukaryotes. Scale chr14: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 200 bases hg38 49,586,450 49,586,500 49,586,550 49,586,600 49,586,650 49,586,700 49,586,750 49,586,800 49,586,850 49,586,900 49,586,950 49,587,000 49,587,050 Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies Vertebrate Multiz Alignment & Conservation (100 Species) Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker RPS29 RPS29 RN7SL1 Cons 100 Verts Wikipedia article: Signal recognition particle 7SL in the UCSC genome browser.
  • 22. Alu elements (I) ▶ Transposable element, derived from 7SL RNA. Over one million copies in the human genome ▶ Breaks a LOT of bioinformatics tools, frequently masked from genome sequences Wikipedia article: Alu element Dfam entry for the AluY family
  • 23. Alu elements (II) ▶ Is the Alu element on human chromosome 14, between 31,474,114 and 31,473,798 likely to be functional? Scale chr14: GWAS Catalog Chimp Gorilla Orangutan Gibbon Rhesus Crab-eating_macaque Baboon Green_monkey Marmoset Bushbaby Chinese_tree_shrew Squirrel Lesser_Egyptian_jerboa Prairie_vole Chinese_hamster Mouse Rat Naked_mole-rat Guinea_pig Chinchilla Brush-tailed_rat Rabbit Pika Pig Alpaca Bactrian_camel Killer_whale Tibetan_antelope Cow Sheep Domestic_goat White_rhinoceros Cat Dog Ferret_ Panda Pacific_walrus Weddell_seal Black_flying-fox Megabat David’s_myotis_(bat) Little_brown_bat Big_brown_bat Hedgehog Shrew Elephant Cape_elephant_shrew Manatee Cape_golden_mole Tenrec Armadillo Opossum Wallaby Chicken Common dbSNP(153) SINE LINE LTR DNA Simple Low Complexity Satellite RNA Other Unknown 100 bases hg38 31,473,750 31,473,800 31,473,850 31,473,900 31,473,950 31,474,000 31,474,050 31,474,100 31,474,150 Your Sequence from Blat Search GENCODE V39 Basic Gene Annotation Set from GENCODE Version 39 (Ensembl 105) Pseudogene Annotation Set from GENCODE Version 39 (Ensembl 105) NHGRI-EBI Catalog of Published Genome-Wide Association Studies 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Short Genetic Variants from dbSNP release 153 Repeating Elements by RepeatMasker An Alu element in the UCSC genome browser. Mills et al. (2007) Which transposable elements are active in the human genome?
  • 24. The main points ▶ “Function” measured in a variety of ways: ▶ Phenotypic: “biochemical” activity ▶ Evolutionary: evidence of positive/negative selection ▶ Genetic: knockout or mutation and compensation/complement ▶ A Darwinian definition of “function”: phenotype/activity AND evolutionary evidence (SE, CNE or Exaptation) ▶ Evolutionary evidence can be varied: usually based on sequence conservation or reduced SNPs ▶ Some suggest overlapping evidence of biochemical activity & evolutionary conservation (i.e. regions under negative selection).
  • 25. Self-evaluation exercises ▶ Compare and contrast the ENCODE definition of function to the Darwinian definition proposed by Doolittle & Brunet (2017). ▶ Outline a method for determining if a predicted mouse gene is functional. What are potential pitfalls for the approach? ▶ Where do human-accelarated regions, ultraconserved regions, Xist, BC1 RNA and Junk DNA lie in Doolittle & Brunet (2017) classification of functional DNA? ▶ Justify your answers!
  • 26. Further reading ▶ Doolittle & Brunet (2017) On causal roles and selected effects: our genome is mostly junk. BMC Biology. ▶ Kellis et al. (2014) Defining functional DNA elements in the human genome. PNAS. ▶ Neme et al (2017) Random sequences are an abundant source of bioactive RNAs or peptides. Nature Ecology & Evolution. Post questions on the FAQ GoogleDoc: https://docs.google.com/document/d/1PQd dp7C 0cXA8SwUv- qrkTOj8c8fUAt-U Z5dg2yc8/edit?usp=sharing
  • 27. Genetics Mātai Ira otago.ac.nz/genetics Genetics Mātai Ira Social Mixer WHEN Wednesday 29th March 6.00pm to 8pm WHERE BIG13, ground floor Biochemistry Genetics Mātai Ira otago.ac.nz/genetics PIZZA&SOFTDRINKS PROVIDED!
  • 28. Near Rakiriri, Otago Harbour, 7 May 2020.