SlideShare a Scribd company logo
Applications of
Deep Learning
in
Basic Biology Research
Charlene Hsuan-Lin Her
12/28/2015 冏 1
Outline
• Motivation
• What is Deep Learning: A brief review of the history of machine
learning and AI
• Example : The human splicing code reveals new insights into the
genetic determinant of disease
• Conclusion
12/28/2015 冏 2
Motivation
12/28/2015 冏 3
12/28/2015 冏 4
The goal of AI: build a machine to understand
to world around us.
12/28/2015 冏 5
Attempts
Attempt 1:
Attempt 2:
getting features: wheels, handle
12/28/2015 冏 6
inputs
Feature
representation
Learning
algorithm
Domain knowledge
Very task specific
difficulty
12/28/2015 冏 7
our brain UNDERSTANDS the world better
than ANY algorithm
• The neural-rewiring experiment one algorithm hypothesis
-difficult to train
(computationally
expensive, needs
MASSIVE labelled
data)
12/28/2015 冏 8
Deep Learning
• Semi-supervised learning
• What we already know about the brain
• Sparse distributed representation
• Unsupervised feature learning
12/28/2015 冏 9
Reference
• Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised
Feature Learning (UCLA graduate summer school)
• Geoffery Hinton: The Next Generation of Neural Networks (google
tech talks)
• Yousha Bengio: Deep Learning (Machine Learning summer school
2014)
• Deep Learning: The Theoretician's Nightmare or Paradise? (LeCun,
NYU, August 2012)
12/28/2015 冏 10
Example: The human splicing
code reveals new insights into
the genetic determinant of
disease
{Xiong, 2015 #490}
12/28/2015 冏 11
Question
Genetic variants disease
Intronic exonic
Synomou
s
mutation
Directly
Alter
protein
sequenc
e
?
Splicing
12/28/2015 冏 12
Previous approach{Barash, 2010 #499}:
regulatory model
12/28/2015 冏 13
Study design
Train the model Mined 10,689 exons that displayed
evidence of alternative splicing and
extracted 1393 sequence features
from each exon and its neighboring
introns and exons
Model must not contradict
with current molBio
knowledge
RBP binding ability,
RBP expression,
context dependent effect of
splicing codes
REP knockdown data
4 individual blood sample
Linking it to disease SNV
autism,
spinal muscular atrophy,
nonpolyposis colorectal cancer
12/28/2015 冏 14
Model
Linear Model:
R^2=0.66
High v.s.
low(33%):
AUC=95.5%
High v.s.
low(10%)
AUC=99.1%
12/28/2015 冏 15
RNA-binding protein (RBP) v.s. residual
splicing activity
• residual splicing activity= observed Ψ – predicted Ψ
12/28/2015 冏 16
Trans-acting factor (RBPs)
MBNL RBP
knockdown
Altered slicing (ΔΨ >>0)
Exons that are not affected
(ΔΨ~0)
MBNL feature model Predicted
ΔΨ
12/28/2015 冏 17
SNV v.s ΔΨ
• studied the effects of SNVs using the largest value of all tissue
SNV
SNP: common
MAF: rare and
linked to disease
12/28/2015 冏 18
SNV v.s ΔΨ: do disease SNVs disrupt splicing
more frequently than common SNVs?
12/28/2015 冏 19
SNV v.s ΔΨ
12/28/2015 冏 20
Spinal muscular atrophy (autosomal
recessive)
12/28/2015 冏 21
Spinal muscular atrophy (autosomal
recessive)
12/28/2015 冏 22
Nonpolyposis colorectal cancer (oligogenic)
12/28/2015 冏 23
Autism spectrum disease (multigenic)
12/28/2015 冏 24
Autism spectrum disease (multigenic)
12/28/2015 冏 25
Conclusion
• We built a model to address how genetic variation affects splicing
• The disease variants have regulatory scores significantly different
from those of the rare and common variants, but the distribution of
regulatory scores is indistinguishable for rare and common variants
• Potential sources of prediction error include unaccounted-for RNA
features, inaccuracies in computed features, imperfect modeling of
splicing levels, and limitations due to a focus on cassette splicing.
• it will be important to seek regulatory models that encompass other
major steps in gene regulation
12/28/2015 冏 26
Reference
• 1. Barash, Y. et al. Deciphering the splicing code. Nature 465, 53-9
(2010).
• 2. Xiong, H.Y. et al. RNA splicing. The human splicing code reveals
new insights into the genetic determinants of disease. Science 347,
1254806 (2015).
12/28/2015 冏 27
• Neural nets are supposed to do what humans are good at….
• HOW will these models help biologists understand the world better?
• Challenges
• Validation
• Insufficient information
12/28/2015 冏 28

More Related Content

PPTX
Librarians and Research Evaluation Support Brief Survey Results
PDF
Deep learning review
PPTX
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
PDF
Spark Based Distributed Deep Learning Framework For Big Data Applications
PDF
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
PPT
Deep Learning Jeff-Shomaker_1-20-17_Final_
PPTX
What Deep Learning Means for Artificial Intelligence
PDF
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Librarians and Research Evaluation Support Brief Survey Results
Deep learning review
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Spark Based Distributed Deep Learning Framework For Big Data Applications
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Deep Learning Jeff-Shomaker_1-20-17_Final_
What Deep Learning Means for Artificial Intelligence
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM

Viewers also liked (17)

PDF
Apache spark - Spark's distributed programming model
PDF
Deep learning presentation
PDF
Introduction to Machine Learning and Deep Learning
PDF
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
PDF
Deep Learning using Tensorflow and Data Science Experience
PPTX
Deep Learning in Computer Vision
PDF
How to win data science competitions with Deep Learning
PPTX
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
PDF
Deep Learning Computer Build
PDF
Deep learning - Conceptual understanding and applications
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
PPTX
HPC Top 5 Stories: March 22, 2017
PDF
Deep Learning - The Past, Present and Future of Artificial Intelligence
PDF
Data Science - Part XVII - Deep Learning & Image Processing
PPTX
Deep neural networks
PPTX
What is Deep Learning?
PDF
Deep Learning through Examples
Apache spark - Spark's distributed programming model
Deep learning presentation
Introduction to Machine Learning and Deep Learning
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
Deep Learning using Tensorflow and Data Science Experience
Deep Learning in Computer Vision
How to win data science competitions with Deep Learning
Deep Learning for Data Scientists - Data Science ATL Meetup Presentation, 201...
Deep Learning Computer Build
Deep learning - Conceptual understanding and applications
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
HPC Top 5 Stories: March 22, 2017
Deep Learning - The Past, Present and Future of Artificial Intelligence
Data Science - Part XVII - Deep Learning & Image Processing
Deep neural networks
What is Deep Learning?
Deep Learning through Examples
Ad

Similar to 20151223application of deep learning in basic bio (20)

PPTX
Ml in genomics
PPTX
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
PDF
Early Benchmarking Results for Neuromorphic Computing
PDF
2224d_final
PDF
Deep learning in medicine: An introduction and applications to next-generatio...
PDF
Theoretical Neuroscience and Deep Learning Theory
PPTX
Learning biologically relevant features using convolutional neural networks f...
PDF
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
PDF
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
PPTX
Neural Networks in computational biology.pptx
PDF
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
PDF
Deep learning for biomedicine
PPTX
Neural networks
PDF
From Conventional Machine Learning to Deep Learning and Beyond.pptx
PPTX
Tsinghua invited talk_zhou_xing_v2r0
PPTX
Application and Implementation of different deep learning
PPTX
lec01.pptx
PPSX
Deep learning and its applications in biomedicine
PPTX
Towards reading genomic data using deep learning-driven NLP techniques
PDF
Deep learning 1
Ml in genomics
Learning Biologically Relevant Features Using Convolutional Neural Networks f...
Early Benchmarking Results for Neuromorphic Computing
2224d_final
Deep learning in medicine: An introduction and applications to next-generatio...
Theoretical Neuroscience and Deep Learning Theory
Learning biologically relevant features using convolutional neural networks f...
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
AdaptivesequencingusingnanoporesanddeeplearningofmitochondrialDNA
Neural Networks in computational biology.pptx
CLASSIFICATION OF CANCER BY GENE EXPRESSION USING NEURAL NETWORK
Deep learning for biomedicine
Neural networks
From Conventional Machine Learning to Deep Learning and Beyond.pptx
Tsinghua invited talk_zhou_xing_v2r0
Application and Implementation of different deep learning
lec01.pptx
Deep learning and its applications in biomedicine
Towards reading genomic data using deep learning-driven NLP techniques
Deep learning 1
Ad

Recently uploaded (20)

PDF
. Radiology Case Scenariosssssssssssssss
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
Sciences of Europe No 170 (2025)
PPTX
2Systematics of Living Organisms t-.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
. Radiology Case Scenariosssssssssssssss
Biophysics 2.pdffffffffffffffffffffffffff
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
famous lake in india and its disturibution and importance
The KM-GBF monitoring framework – status & key messages.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
neck nodes and dissection types and lymph nodes levels
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
ECG_Course_Presentation د.محمد صقران ppt
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Derivatives of integument scales, beaks, horns,.pptx
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
Sciences of Europe No 170 (2025)
2Systematics of Living Organisms t-.pptx
Phytochemical Investigation of Miliusa longipes.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
POSITIONING IN OPERATION THEATRE ROOM.ppt

20151223application of deep learning in basic bio

  • 1. Applications of Deep Learning in Basic Biology Research Charlene Hsuan-Lin Her 12/28/2015 冏 1
  • 2. Outline • Motivation • What is Deep Learning: A brief review of the history of machine learning and AI • Example : The human splicing code reveals new insights into the genetic determinant of disease • Conclusion 12/28/2015 冏 2
  • 5. The goal of AI: build a machine to understand to world around us. 12/28/2015 冏 5
  • 6. Attempts Attempt 1: Attempt 2: getting features: wheels, handle 12/28/2015 冏 6
  • 8. our brain UNDERSTANDS the world better than ANY algorithm • The neural-rewiring experiment one algorithm hypothesis -difficult to train (computationally expensive, needs MASSIVE labelled data) 12/28/2015 冏 8
  • 9. Deep Learning • Semi-supervised learning • What we already know about the brain • Sparse distributed representation • Unsupervised feature learning 12/28/2015 冏 9
  • 10. Reference • Andrew Ng: Deep Learning, Self-Taught Learning and Unsupervised Feature Learning (UCLA graduate summer school) • Geoffery Hinton: The Next Generation of Neural Networks (google tech talks) • Yousha Bengio: Deep Learning (Machine Learning summer school 2014) • Deep Learning: The Theoretician's Nightmare or Paradise? (LeCun, NYU, August 2012) 12/28/2015 冏 10
  • 11. Example: The human splicing code reveals new insights into the genetic determinant of disease {Xiong, 2015 #490} 12/28/2015 冏 11
  • 12. Question Genetic variants disease Intronic exonic Synomou s mutation Directly Alter protein sequenc e ? Splicing 12/28/2015 冏 12
  • 13. Previous approach{Barash, 2010 #499}: regulatory model 12/28/2015 冏 13
  • 14. Study design Train the model Mined 10,689 exons that displayed evidence of alternative splicing and extracted 1393 sequence features from each exon and its neighboring introns and exons Model must not contradict with current molBio knowledge RBP binding ability, RBP expression, context dependent effect of splicing codes REP knockdown data 4 individual blood sample Linking it to disease SNV autism, spinal muscular atrophy, nonpolyposis colorectal cancer 12/28/2015 冏 14
  • 15. Model Linear Model: R^2=0.66 High v.s. low(33%): AUC=95.5% High v.s. low(10%) AUC=99.1% 12/28/2015 冏 15
  • 16. RNA-binding protein (RBP) v.s. residual splicing activity • residual splicing activity= observed Ψ – predicted Ψ 12/28/2015 冏 16
  • 17. Trans-acting factor (RBPs) MBNL RBP knockdown Altered slicing (ΔΨ >>0) Exons that are not affected (ΔΨ~0) MBNL feature model Predicted ΔΨ 12/28/2015 冏 17
  • 18. SNV v.s ΔΨ • studied the effects of SNVs using the largest value of all tissue SNV SNP: common MAF: rare and linked to disease 12/28/2015 冏 18
  • 19. SNV v.s ΔΨ: do disease SNVs disrupt splicing more frequently than common SNVs? 12/28/2015 冏 19
  • 21. Spinal muscular atrophy (autosomal recessive) 12/28/2015 冏 21
  • 22. Spinal muscular atrophy (autosomal recessive) 12/28/2015 冏 22
  • 23. Nonpolyposis colorectal cancer (oligogenic) 12/28/2015 冏 23
  • 24. Autism spectrum disease (multigenic) 12/28/2015 冏 24
  • 25. Autism spectrum disease (multigenic) 12/28/2015 冏 25
  • 26. Conclusion • We built a model to address how genetic variation affects splicing • The disease variants have regulatory scores significantly different from those of the rare and common variants, but the distribution of regulatory scores is indistinguishable for rare and common variants • Potential sources of prediction error include unaccounted-for RNA features, inaccuracies in computed features, imperfect modeling of splicing levels, and limitations due to a focus on cassette splicing. • it will be important to seek regulatory models that encompass other major steps in gene regulation 12/28/2015 冏 26
  • 27. Reference • 1. Barash, Y. et al. Deciphering the splicing code. Nature 465, 53-9 (2010). • 2. Xiong, H.Y. et al. RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015). 12/28/2015 冏 27
  • 28. • Neural nets are supposed to do what humans are good at…. • HOW will these models help biologists understand the world better? • Challenges • Validation • Insufficient information 12/28/2015 冏 28

Editor's Notes

  • #6: What do we want our computers to do? The goal of AI is to build a machine to understand to world around us. Why is it hard? All efforts are in generating better features! First attempt Linear classifier Feature representation: edge The one algorithm hypothesis: our brain UNDERSTANDS the world better than ANY algorithm The neural-rewiring experiment What we already know about the brain: sparse coding, RBMs…. An algorithm that automatically learn good features representating the data Example: computer vision: sparse coding instead of pixels Example: audio, matches with biological data Recurrence between layers to make more sophisitacated features
  • #15: 我們都知道DNA裡有coding 跟沒在coding的地方~namely, exon and intron, right? 有一群人,覺得人的SNP就是造成A會得糖尿病但B卻不會的關鍵,所以他就蒐集了大家的血,然後想看看大家的不同在哪邊~結果,他發現,很多大家有不同的地方並不是會產生蛋白殖(也就是我們愛說的phenotype),阿這樣就靠腰啦~這樣這個不同的地方,到底四怎麼讓人有病ㄉㄋ?
  • #17: The number of strong correlations dropped to 60, which suggests that our computational model mostly encompasses the collective effects of known RBPs (Fig. 2)
  • #18: knockdown data for Muscleblind-like (MBNL) RBPs in HeLa cells 664 exons that exhibited a significant change in RNA-seq–assessed C upon MBNL knockdown, as well as 26,457 exons whose levels did not change significantly upon knockdown When we scored exons according to how much the model predicted that psi would change when the MBNL features were removed in silico MBNL-regulated exons frequently had higher scores more accurately than direct examination of MBNL binding sites [10.9% improvement in the AUC; P = 1.4 × 10–14 The model also includes the effect of tran-acting regulatory elements
  • #19: examined whether disease SNVs are predicted to disrupt splicing (|DC| ≥ 5%) more frequently than common SNPs
  • #20: SNVs that disrupt splicing (|DC| ≥ 5%; table S4), frequently in a way that depends on cis context (fig)intronic disease SNVs that are more than 30 nt from any splice site are 9.0 times as likely to disrupt splicing regulation relative to common SNPs in the same region. Within exons, synonymous disease SNVs are on average 9.3 times as likely as synonymous SNPs to disrupt splicing regulation. missense disease SNVs are not more likely to disrupt splicing than missense SNPs. SNVs that minimally alter protein function are on average 5.6 times as likely to disrupt splicing regulation.
  • #21: the scores of disease SNVs are significantly higher (P < 1 × 10–320, KS test, 71.2%, n =280,638). Fewer than 5% of GWAS SNPs are estimated to cause misregulation in a fashion similar to disease SNVs (13), indicating that our method can detect disease SNVs that are not detectable by GWAS (B) strong experimental evidence are substantially higher t h a n t ho s e wi t h w e a k o r i n d i r e c t e v i d e n c e ( F i g . 4 B)
  • #22: inclusion of exon 7 in SMN2 and loss of function first examined the regulatory consequences of four nucleotides that differ between SMN1 and SMN2 (red lightenings) These substitutions are known to lead to decreased inclusion of exon 7 in SMN2 and loss of function. Mutagenesis data indicate that A100G enhances skipping by 36% to 63% Gain of function in SMN2 green lightening
  • #23: inclusion of exon 7 in SMN2 and loss of function first examined the regulatory consequences of four nucleotides that differ between SMN1 and SMN2 (red lightenings) These substitutions are known to lead to decreased inclusion of exon 7 in SMN2 and loss of function. Mutagenesis data indicate that A100G enhances skipping by 36% to 63% Gain of function in SMN2 green lightening
  • #25: exon 7 skipping is predominantly caused by C6T and to a much lesser degree by G-44A, whereas A100G and A215G are predicted not to have a significant impact on splicing.
  • #26: exon 7 skipping is predominantly caused by C6T and to a much lesser degree by G-44A, whereas A100G and A215G are predicted not to have a significant impact on splicing.
  • #27: exon 7 skipping is predominantly caused by C6T and to a much lesser degree by G-44A, whereas A100G and A215G are predicted not to have a significant impact on splicing.