SlideShare a Scribd company logo
The Role of Machine Learning in
Modelling the Cell.

      John Hawkins
      ARC Centre for Complex Systems
      University of Queensland
      Australia
Overview of Talk

   Overview of cell biology
   Modelling the cell
   Subcellular localisation signals
   Machine Learning in General
   Neural networks
       Feed Forward versus Recurrent
Cell Biology – Quick and Dirty
                      Membrane bound
                       Organelles
                      Nucleus
                      DNA -> RNA ->
                       Protein
                      Transport, e.g.
                        Mitochondria

                        Peroxisome

                      Modification, e.g.
                        Disulphide
                          Bond Formation
                        Glycosylation
Cell Feedback

   At a particular time point a set of genes
    will be expressed.
   These do not remain constant, instead
    the emerging picture is that
       There is some essential cycle of gene
        expression
       With a capacity to indulge in alternative
        pathways of expression under external
        stimulus.
   The pattern of expression is
Modelling the cell
   Ideally we would like to model the cell
    from the level of a 3D physical
    simulation.
       Currently this is infeasible
   So numerous approaches are taken to
    form abstractions
       Gene Regulatory Networks
       Differential equation models of particular
        pathways
       Machine learning models of particular
Biological Sequences
   Many Important Biological Molecules are
    Polymers.
       Thus representable as a sequence of discrete
        symbols.
   Sequence M = [m1, m2, …, mn] where:
   DNA mi  { A, T, G, C }
   RNA mi  { A, U, G, C }
   Protein mi  { G, A, V, L, I, P, S, C, T, M, D,
    E, H, K, R, N, Q, F, Y, W }
Information Content
   How much information in a linear sequence?
   Two crucial elements to function
       Physical/chemical properties
       Molecular shape
   Each residue has well known properties
   Denaturation. (Anfinsen,1973).
       Sequence defines arrangement of chemical
        properties which in turn defines folding.
Biological Patterns

   Motifs – General term for patterns
   Numerous Definitions & Visualisations
       PROSITE Patterns – Regular Expression
       PROSITE Profiles – Probability Matrix
       LOGOs
Peroxisomal Localisation

   Predominantly controlled by a C-
    terminal sequence called the PTS1
    signal.
   Roughly 12 residues long
   Known dependencies between
    locations
Nuclear Export
   Some proteins move continuously between the
    nucleus and cytoplasm of the cell.
   Either as:
       Transporters
       Regulators
Machine Learning
   Requires a set of examples, with
       Raw input, sequences data, and
       Known classes that the machine should
        predict
   In essence Function Approximation
       Start with a General parametrised
        function over the input data
       Adjust the parameters until the output of
        the function is a good approximation to
        the known classes of the examples.
Bias

   Bias is generally unavoidable
       (Mitchell, 1980)
   Three Sources of Bias
       Input Encoding
       Function Structure (Architecture)
       Parameter adjustment algorithm (learning)
Neural Networks
   Graphical Model consisting of layers of
    nodes connected by weights
   Feed forward neural networks
       Fixed input window
       Signal propagates in a single pass through the
        layers
   Recurrent Neural Networks
       Signal processed in parts
       Recurrent connections maintain a memory state
       Output generated after processing the last piece
        of the input signal
Simple Neural Networks




   F F N N O h = S (W1 ∙ I1 + W2 ∙ I2 + b)
   R N N O h = S (W1 ∙ I2 + W2 ∙ S (W1 ∙
    I1 + b ) + b )
RNNs in Bioinformatics

   Bi-Directional RNN
Applications

   We have applied these techniques to
       Subcellular Localisation to
           Endoplasmic Reticulum
           Mitochondria
           Chloroplast
           Peroxisome
   http://pprowler.imb.uq.edu.au
   Working with whole genome data and
    wet lab biologists to use these tools for
    data mining.
The End…




           ?

More Related Content

PDF
Apollo : A workshop for the Manakin Research Coordination Network
PDF
Apollo Introduction for the Chestnut Research Community
PDF
Apollo Introduction for i5K Groups 2015-10-07
PPT
Bio process
PDF
Gene Expression Data Analysis
PPT
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking
Apollo : A workshop for the Manakin Research Coordination Network
Apollo Introduction for the Chestnut Research Community
Apollo Introduction for i5K Groups 2015-10-07
Bio process
Gene Expression Data Analysis
20080110 Genome exploration in A-T G-C space: an introduction to DNA walking

What's hot (20)

PPTX
DNA Sequencing in Phylogeny
PPT
PPTX
Massively Parallel Signature Sequencing (MPSS)
PPTX
Intergenic segments
PDF
Gene mapping and its sequence
PPTX
prediction methods for ORF
PDF
Understanding the Nell2-Robo3 Interaction in Axon Guidance
PPT
Genome Mapping
PDF
Apolo Taller en BIOS
PDF
PDF
Apollo - A webinar for the Phascolarctos cinereus research community
PDF
Gene prediction strategies
PPTX
Lecture 3 gene cloning strategies
PDF
BITS - Comparative genomics: gene family analysis
PDF
Plant nuclear genome organization
PDF
Tyler functional annotation thurs 1120
DOCX
Genome rearrangement
PDF
Gene mapping / Genetic map vs Physical Map | determination of map distance a...
PPT
Critique
PDF
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
DNA Sequencing in Phylogeny
Massively Parallel Signature Sequencing (MPSS)
Intergenic segments
Gene mapping and its sequence
prediction methods for ORF
Understanding the Nell2-Robo3 Interaction in Axon Guidance
Genome Mapping
Apolo Taller en BIOS
Apollo - A webinar for the Phascolarctos cinereus research community
Gene prediction strategies
Lecture 3 gene cloning strategies
BITS - Comparative genomics: gene family analysis
Plant nuclear genome organization
Tyler functional annotation thurs 1120
Genome rearrangement
Gene mapping / Genetic map vs Physical Map | determination of map distance a...
Critique
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
Ad

Similar to The role of machine learning in modelling the cell (20)

PDF
Biological Network Inference via Gaussian Graphical Models
PPTX
Integrate-and-fire neuron model with STDP plasticity bounded by neurotransmi...
PPTX
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
PPT
Final cnn shruthi gali
PDF
Introduction to biocomputing
PDF
Synaptic Transmission
PDF
NIPS machine learning in computational biology presentation
PPTX
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
PPT
American Statistical Association October 23 2009 Presentation Part 1
PPTX
Intro to cells
PPTX
Intro to cells
PDF
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
PDF
Lecture at the C3BI 2018
PPT
Sample Powerpoint Presentation
PDF
3. What is an ANN Describe various types of ANN. Which ANN do you p.pdf
PPTX
Introtocells 111109074946-phpapp01
PPT
Molecular biology lecture
PDF
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
PPTX
Introduction to systems biology – How systems work?
PPT
EGR 183 Bow Tie Presentation
Biological Network Inference via Gaussian Graphical Models
Integrate-and-fire neuron model with STDP plasticity bounded by neurotransmi...
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
Final cnn shruthi gali
Introduction to biocomputing
Synaptic Transmission
NIPS machine learning in computational biology presentation
Interactomics, Integromics to Systems Biology: Next Animal Biotechnology Fron...
American Statistical Association October 23 2009 Presentation Part 1
Intro to cells
Intro to cells
Predicting Functional Regions in Genomic DNA Sequences Using Artificial Neur...
Lecture at the C3BI 2018
Sample Powerpoint Presentation
3. What is an ANN Describe various types of ANN. Which ANN do you p.pdf
Introtocells 111109074946-phpapp01
Molecular biology lecture
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
Introduction to systems biology – How systems work?
EGR 183 Bow Tie Presentation
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

The role of machine learning in modelling the cell

  • 1. The Role of Machine Learning in Modelling the Cell. John Hawkins ARC Centre for Complex Systems University of Queensland Australia
  • 2. Overview of Talk  Overview of cell biology  Modelling the cell  Subcellular localisation signals  Machine Learning in General  Neural networks  Feed Forward versus Recurrent
  • 3. Cell Biology – Quick and Dirty  Membrane bound Organelles  Nucleus  DNA -> RNA -> Protein  Transport, e.g.  Mitochondria  Peroxisome  Modification, e.g.  Disulphide Bond Formation  Glycosylation
  • 4. Cell Feedback  At a particular time point a set of genes will be expressed.  These do not remain constant, instead the emerging picture is that  There is some essential cycle of gene expression  With a capacity to indulge in alternative pathways of expression under external stimulus.  The pattern of expression is
  • 5. Modelling the cell  Ideally we would like to model the cell from the level of a 3D physical simulation.  Currently this is infeasible  So numerous approaches are taken to form abstractions  Gene Regulatory Networks  Differential equation models of particular pathways  Machine learning models of particular
  • 6. Biological Sequences  Many Important Biological Molecules are Polymers.  Thus representable as a sequence of discrete symbols.  Sequence M = [m1, m2, …, mn] where:  DNA mi  { A, T, G, C }  RNA mi  { A, U, G, C }  Protein mi  { G, A, V, L, I, P, S, C, T, M, D, E, H, K, R, N, Q, F, Y, W }
  • 7. Information Content  How much information in a linear sequence?  Two crucial elements to function  Physical/chemical properties  Molecular shape  Each residue has well known properties  Denaturation. (Anfinsen,1973).  Sequence defines arrangement of chemical properties which in turn defines folding.
  • 8. Biological Patterns  Motifs – General term for patterns  Numerous Definitions & Visualisations  PROSITE Patterns – Regular Expression  PROSITE Profiles – Probability Matrix  LOGOs
  • 9. Peroxisomal Localisation  Predominantly controlled by a C- terminal sequence called the PTS1 signal.  Roughly 12 residues long  Known dependencies between locations
  • 10. Nuclear Export  Some proteins move continuously between the nucleus and cytoplasm of the cell.  Either as:  Transporters  Regulators
  • 11. Machine Learning  Requires a set of examples, with  Raw input, sequences data, and  Known classes that the machine should predict  In essence Function Approximation  Start with a General parametrised function over the input data  Adjust the parameters until the output of the function is a good approximation to the known classes of the examples.
  • 12. Bias  Bias is generally unavoidable  (Mitchell, 1980)  Three Sources of Bias  Input Encoding  Function Structure (Architecture)  Parameter adjustment algorithm (learning)
  • 13. Neural Networks  Graphical Model consisting of layers of nodes connected by weights  Feed forward neural networks  Fixed input window  Signal propagates in a single pass through the layers  Recurrent Neural Networks  Signal processed in parts  Recurrent connections maintain a memory state  Output generated after processing the last piece of the input signal
  • 14. Simple Neural Networks  F F N N O h = S (W1 ∙ I1 + W2 ∙ I2 + b)  R N N O h = S (W1 ∙ I2 + W2 ∙ S (W1 ∙ I1 + b ) + b )
  • 15. RNNs in Bioinformatics  Bi-Directional RNN
  • 16. Applications  We have applied these techniques to  Subcellular Localisation to  Endoplasmic Reticulum  Mitochondria  Chloroplast  Peroxisome  http://pprowler.imb.uq.edu.au  Working with whole genome data and wet lab biologists to use these tools for data mining.