Lecture12

MicroRNA Detection
Khan Shing
CS374
May 8, 2008

Source: Science 2 September 2005: Vol. 30

Outline
 Biological background
• Gene regulation
• microRNAs
 microRNA detection
• Random forests
• Comparative genomics
 microRNA target recognition
• Site accessibility

Information Flow
Source: http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology

Gene Regulation
• Transcriptional regulation
◦ Enhancers, promoters, transcription factors,
epigenetic modifications
• Post-transcriptional regulation
◦ mRNA processing, small RNAs
• Post-translational regulation
◦ Protein activation, inhibition, degradation

Source: Stark A. et al. 2007. Systematic discovery and characterization of
fly microRNAs using 12 Drosophila genomes. Genome Res.
microRNA
• RNA can fold like proteins:
possess primary,
secondary and tertiary
structure
• Secondary hairpin
structure crucial to
processing of small RNAs

Source: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of
small RNAs. Science 309: 1519–1524.
miRNA Processing

Source: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of
small RNAs. Science 309: 1519–1524.
miRNAs Suppress Gene
Expression

microRNA Detection
Stark A. et al. 2007. Systematic discovery
and characterization of fly microRNAs
using 12 Drosophila genomes. Genome
Res. doi:10.1101/gr.6593807.

Source: Leo Breiman, Random Forests, Machine Learning, v.45 n.1, p.5-
32, October 1 2001.
microRNA Detection
• Machine learning approach
◦ Find characteristics that distinguish miRNAs
◦ Use these features to train a model
• Random forests
◦ Collection of many independently constructed
classification trees
◦ Each tree “votes” and the tallied votes yield a
score

Source: http://www.gmupolicy.net/its/incidentduration/image351.gif
How to Classify Objects?

Source:
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Random Forest
N cases in training set, M input variables
• Sample N cases at random, with replacement, from
the original data. This sample will be the training set
for growing the tree.
• At each node, m variables (m << M) are selected at
random out of the M and the best split on these m is
used to split the node. The value of m is held constant
during the forest growing.
• Each tree is grown to the largest extent possible.
There is no pruning.

Source: http://www.jfsowa.com/figs/bintree.gif
Random Forest
• Trained on RFAM data set of 60 cloned
miRNAs and random negative set (250 putative
miRNA hairpins) with a variety of features
• Independently construct 500 trees

Source: CS262 Lecture 17, Win07, Batzoglou
Comparative Genomics

Structural Features
Compare the 60 cloned miRNAs in the RFAM database to
random “miRNA like” hairpins (~760,000)

Conservation Features

Discovery and validation of new
miRNAs
Alone, each feature does not provide enough
discriminatory power, but trained into the model,
~4500 fold enrichment

Discovery and validation of new
miRNAs
• Rank all 760,355 putative miRNAs
according to this combined score
• Finds 41 novel miRNA candidates
• Validate by sequencing and other
methods

Source: Stark A. et al. 2007. Systematic dis

Results
• Antisense strand miRNAs
• miRNA* sequences

Accurate Prediction of Mature
miRNAs

microRNA Target Recognition
Kertesz, M., Iovino, N., Unnerstall, U., Gaul,
U. & Segal, E. The role of site accessibility
in microRNA target recognition. Nat.
Genet. 39, 1278–1284 (2007).

Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The
role of site accessibility in microRNA target recognition. Nat. Genet. 39,
Motivation for looking at site
accessibility
• Existing methods for
finding miRNA targets rely
mostly on sequence
specificity
• But miRNAs act as part of
a protein complex. They
have size and can be
blocked by mRNA
secondary structure

Proof of Principle

How to use this fact?
• Develop an energy based score to rate miRNA-
target interactions
• Explain ∆G – free energy of molecular
interactions
• ∆∆G – the difference between free energy gain
of the system when an miRNA binds to its target
and the free energy loss of unpairing the mRNA
target sequence secondary structure.

Test how good ∆∆G is
Correlates well with
repression in
luciferase assays:
Even better if
flanking regions
are included:

Comparison to other target
predictors

References
Ruby J.G. et al. 2007. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of
Drosophila microRNAs. Genome Res. doi:10.1101/gr.6597907
E. Berezikov, F. Thuemmler, L.W. van Laake, I. Kondova, R. Bontrop, E. Cuppen and R.H. Plasterk, Diversity of
microRNAs in human and chimpanzee brain, Nat. Genet. 38 (2006), pp. 1375–1377.

Other figures

Lecture12

More Related Content

What's hot (20)

Similar to Lecture12 (20)

Recently uploaded (20)

Lecture12

Editor's Notes