Bio-IT

Pek Yee Lum, Ph.D.
   VP Life Sciences
                      1
Today’s Topics
What are the problems we face with big data today in the drug
discovery and development world?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Unveiling our NGS pipeline and genome browser

Summary

                          © 2012 Ayasdi inc.
                                                                2
Today’s Topics
What are the problems we face with big data today in the drug
discovery and development world?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Unveiling our NGS pipeline and genome browser

Summary

                          © 2012 Ayasdi inc.
                                                                3
Big data, bigger problems
       for drug discovery and development
       • Ever-growing complex and disparate datasets
       • Scalability issues
          • NGS raw data sometimes as large as 1TB per sample
          • Accessing data no longer simple for the untrained users
          • New IT infrastructure for every new problem
          • Every new data type needs custom tools
       • User experience does not exist today
          • Bioinformatics tools are not integrated
          • Analysis and visualization is disparate
       • Accelerating the discovery process requires rethinking the
            analysis workflow and streamlining its computational
            infrastructure
Problems in drug discovery and development   © 2012 Ayasdi inc.
                                                                      4
Data analysis landscape today
                                                                   R
                                              Cytoscape




                  Database




                                                                  Spotfire
                            Matlab




                                              Math
                                                                          Writing code




                                      Cloud


Problems in drug discovery and development                © 2012 Ayasdi inc.
                                                                                         5
Biological Complexity on top of
       data problems
       Diseases are often complex. Many components work in synergy
       for disease manifestation.
               The need to identify perhaps not a single drug target
               but multiple targets that work as a network

       Human population is heterogenous- drugs fail because of the
       inability to stratify the patient population
               The need to identify biomarkers that work for patient
               stratification to decrease risk of adverse events and
               lack of efficacy


Problems in drug discovery and development   © 2012 Ayasdi inc.
                                                                       6
Today’s Topics
What are the problems we face with big data today in the drug
discovery and development world?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Unveiling our NGS pipeline and genome browser

Summary

                          © 2012 Ayasdi inc.
                                                                7
Ayasdi Iris increases probability of success (POS) and
         shrinks time to market

         Discovery of subtle patterns in a sea of noisy data

         Handling of all data- large or small on the cloud

         Fusing disparate data sets with ease

         Access to critical public data on demand

         Allows collaboration for all types of stakeholders on one
         platform


The Ayasdi solution                 © 2012 Ayasdi inc.
                                                                     8
Today’s Topics
What are the problems we face with big data today in the drug
discovery and development world?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Unveiling our NGS pipeline and genome browser

Summary

                          © 2012 Ayasdi inc.
                                                                9
Solution Math+CS+UX platform



          Data has Shape

                   and Shape has Meaning



Data has shape         © 2012 Ayasdi inc.
                                            10
What is shape ?




       If Age, Weight and Height were                 In reality, age, weight and height are
       distributed randomly                           correlated and that data has a shape




Data has shape                          © 2012 Ayasdi inc.
                                                                                               11
What is shape ?




     Ayasdi Iris identifies the shape or pattern in data

Data has shape            © 2012 Ayasdi inc.
                                                          12
Why Topological Data Analysis
       for drug discovery and development
       1. Coordinate free representations are vital when studying data
          collected using different technologies- lots of public data available,
          many studies done at different times, different data types collected

       2. Deformation invariance introduces robustness into the analysis,
          which is important in the study of real world data- biological
          heterogeneity is complex and needs an approach that is deformation
          (variation) resistant

       3. Compressed representations are obviously important when one is
          dealing with very large data sets- with high dimensional omics data
          and Next Gen sequencing getting more affordable, the amount of
          data is increasing exponentially


Why topology ?                        © 2012 Ayasdi inc.
                                                                                  13
Ayasdi Iris

           Uses principles of geometry to find shape (pattern)
           in data

           Works across and for any type of data

           Works with any amount of data

           Generates and validates hypotheses

           Quick, interactive results

Why topology ?                   © 2012 Ayasdi inc.
                                                                14
Patient stratification - Results
        gene expression profiling of breast tumors
        Identified a sub-group of patients that are triple
        negative with very good prognosis

        Identified a sub-group of patients that are Luminal A
        but with perfect survival (published PNAS 2011)

        These groups were identified in independent datasets

        These sub-groups were hard to find using conventional
        methods

Patient stratification           © 2012 Ayasdi inc.
                                                               15
Today’s Topics
What are the problems we face with big data today in the drug
discovery and development world?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Unveiling our NGS pipeline and genome browser

Summary

                          © 2012 Ayasdi inc.
                                                                16
Each node contains
                                                    subsets of patients




                                                                               These patients are
                                                                               eccentric (away from the
                                                                               center of the data)


                        These patients are close
                        to the center of the data




                                                                          Color scheme




    Topological Map of Patient-Patient Relationships according to the molecular
    characteristics of their tumors (in this case, gene expression)
Patient stratification                                © 2012 Ayasdi inc.
                                                                                                          17
Zero event death
                        Low event death

                                                                                  High event death
              High event death

                                                                             Low event death

                                                Mixed event death
                                                                                                     These patients are
                                                                                                     eccentric (away from the
                                                                                                     center of the data)




                                                                      Zero event death



                                                                                               Color scheme
                                                                                               Color scheme




Patient stratification                                     Zero event death
                                                       © 2012 Ayasdi inc.
                                                                                                                                18
patients did not survive 10 yrs

                                                                 B
                                             D

                                  E




                              A
                                                                          Triple Negative
                                                                          HER2-, ESR-, PGR-
                                                                      C

                 patients survived 10 yrs




Patient stratification                            © 2012 Ayasdi inc.
                                                                                              19
NKI data
                       high ESR1
 death




                                                  low ESR1
 survived




                           high ESR1
                                           low ESR1                    Topological maps from
            GSE2304                                                    two independent cancer
                                                                       data sets are very
 relapsed




                                                  low ESR1
                                                                       similar
                        high ESR1
relapse
   no




                       high ESR1

                                       low ESR1

                                                  © 2012 Ayasdi inc.
                                                                                                20
Conventional Methods
                       subtypes difficult to identify
                                   Clustering                        PCA
 ER- did not survive
ER- survived




Patient stratification                           © 2012 Ayasdi inc.
                                                                           21
Next Generation Sequencing
unveiling our solution


DNA-Seq Pipeline
BAMs     VCF      IRIS


RNA-Seq Pipeline
BAMs     cuffcomp/RPKM    IRIS



                                22
1000 genomes DNA-seq data
population classification
                                   Africans




       East Asians




                           Caucasians




                                              23
Identification and visualization of significant
exon variants using PCA
Han Chinese and Japanese cannot be easily distinguished using
PCA
 Han Chinese (grey)
 Japanese (red)




                                                                24
Identification and visualization of significant
variants
Han Chinese and Japanese can be easily distinguished and
visualized with TDA



     Han Chinese



                                                                         rs2294008
                                                                             TT
                                                                             chr8


                            Japanese




Poster #3: Navigating Next Generation Sequencing Data using Topological Data Analysis and Iris
                                                                                                 25
Today’s Agenda

What are the problems we face in drug discovery and
development?

What problems can Ayasdi solve for you?

Why topology?

Patient stratification using topology

Summary


                          © 2012 Ayasdi inc.
                                                      26
Ayasdi Iris platform
                                      Ayasdi Iris Life Sciences Edition
                                                                                                                          All analyses performed on the cloud on secure servers

                                                                                                          PubMed
                                                                                                            GO
                                                                                                                          Bypass the need to invest in expensive hardware and
                                                                                                           KEGG
                                                                                                            PPI
                                                                                                                          database administration
                                                                                                            GEO


                Interactive Network
                   Visualization
                                                            Integrated Statistics
                                                                 Algorithms
                                                                                                      Integrated Public
                                                                                                          Datasets
                                                                                                                          Flexibility to start your analysis immediately- just upload
                                                                                                                          your data
                                           Ayasdi Iris Cloud Platform
                                                                                                                          Access to public data on demand
                                               Analysis and Visualization
         Topological            Network           Projections           Network        Histograms/          Dendro-
          Mapping               Analysis           (e.g. PCA)          Visualization   Scatterplots          grams




                                             Scalable Distributed Datastore




                                           Proprietary and Public data sources

            Gene
                                 mRNA               SNP                 Clinical        NGS              PubMed
          Expression




Ayasdi Iris platform                                                                                         © 2012 Ayasdi inc.
                                                                                                                                                                                        27
Ayasdi Iris increases probability of success (POS) and
         shrinks time to market

         Discovery of subtle patterns in a sea of noisy data

         Handling of all data- large or small on the cloud

         Fusing disparate data sets with ease

         Access to critical public data on demand

         Allows collaboration for all types of stakeholders on one
         platform


The Ayasdi solution                 © 2012 Ayasdi inc.
                                                                     28
Contact us for more information
www.ayasdi.com
pek@ayasdi.com
info@ayasdi.com
                                  29

More Related Content

PPTX
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
PDF
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
PDF
Identification of Brain Regions Related to Alzheimers' Diseases using MRI Ima...
PDF
Robust face recognition by applying partitioning around medoids over eigen fa...
PDF
Comprehensive Survey of Data Classification & Prediction Techniques
PDF
化学 別冊2011
PPT
Big data supporting drug discovery - cautionary tales from the world of chemi...
PPTX
BigDataEurope - Big Data & Health
Ayasdi Energy Summit, September 2014, Gunnar Carlsson
Energy-based Model for Out-of-Distribution Detection in Deep Medical Image Se...
Identification of Brain Regions Related to Alzheimers' Diseases using MRI Ima...
Robust face recognition by applying partitioning around medoids over eigen fa...
Comprehensive Survey of Data Classification & Prediction Techniques
化学 別冊2011
Big data supporting drug discovery - cautionary tales from the world of chemi...
BigDataEurope - Big Data & Health

Similar to 2011 Big Data - Bigger Problems for Drug Discovery and Development (20)

PDF
Humanizing bioinformatics
PDF
Stephen Friend Dana Farber Cancer Institute 2011-10-24
PDF
Friend StrataRx 2012-10-16
PPTX
2014 aus-agta
PDF
Friend Gastein 2012-10-04
PDF
Biomedical Informatics
PDF
Stephen Friend Institute for Cancer Research 2011-11-01
PDF
Stephen Friend HHMI-Penn 2011-05-27
PDF
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
PDF
Where are the Data? Perspectives from the Neuroscience Information Framework.
PDF
Friend p4c 2012-11-29
PPTX
Cancer genomics first look
PPTX
Big data from small data: A deep survey of the neuroscience landscape data via
PDF
Healthcare data's perfect storm
PDF
Stephen Friend Inspire2Live Discovery Network 2011-10-29
PDF
Knowledge management for integrative omics data analysis
PDF
A Unified Approach to Exploration, Authoring, and Communication with Reproduc...
PDF
Friend NightScience 2012
PDF
MLconf NYC Pek Lum
PPTX
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
Humanizing bioinformatics
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Friend StrataRx 2012-10-16
2014 aus-agta
Friend Gastein 2012-10-04
Biomedical Informatics
Stephen Friend Institute for Cancer Research 2011-11-01
Stephen Friend HHMI-Penn 2011-05-27
Stephen Friend Genetic Alliance 25th Anniversary 2011-06-24
Where are the Data? Perspectives from the Neuroscience Information Framework.
Friend p4c 2012-11-29
Cancer genomics first look
Big data from small data: A deep survey of the neuroscience landscape data via
Healthcare data's perfect storm
Stephen Friend Inspire2Live Discovery Network 2011-10-29
Knowledge management for integrative omics data analysis
A Unified Approach to Exploration, Authoring, and Communication with Reproduc...
Friend NightScience 2012
MLconf NYC Pek Lum
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
Ad

Recently uploaded (20)

PPTX
CARDIOVASCULAR AND RENAL DRUGS.pptx for health study
PPTX
Effects of lipid metabolism 22 asfelagi.pptx
PPTX
Electrolyte Disturbance in Paediatric - Nitthi.pptx
PPT
Rheumatology Member of Royal College of Physicians.ppt
PDF
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
PDF
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
PDF
OSCE Series ( Questions & Answers ) - Set 6.pdf
PPTX
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
PPTX
Radiation Dose Management for Patients in Medical Imaging- Avinesh Shrestha
PDF
04 dr. Rahajeng - dr.rahajeng-KOGI XIX 2025-ed1.pdf
PDF
B C German Homoeopathy Medicineby Dr Brij Mohan Prasad
PDF
Copy of OB - Exam #2 Study Guide. pdf
PPTX
Wheat allergies and Disease in gastroenterology
PPT
nephrology MRCP - Member of Royal College of Physicians ppt
PDF
OSCE Series Set 1 ( Questions & Answers ).pdf
PPTX
Neoplasia III.pptxjhghgjhfj fjfhgfgdfdfsrbvhv
PPTX
Manage HIV exposed child and a child with HIV infection.pptx
PDF
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
PDF
focused on the development and application of glycoHILIC, pepHILIC, and comm...
PDF
The_EHRA_Book_of_Interventional Electrophysiology.pdf
CARDIOVASCULAR AND RENAL DRUGS.pptx for health study
Effects of lipid metabolism 22 asfelagi.pptx
Electrolyte Disturbance in Paediatric - Nitthi.pptx
Rheumatology Member of Royal College of Physicians.ppt
The Digestive System Science Educational Presentation in Dark Orange, Blue, a...
OSCE SERIES - Set 7 ( Questions & Answers ).pdf
OSCE Series ( Questions & Answers ) - Set 6.pdf
Human Reproduction: Anatomy, Physiology & Clinical Insights.pptx
Radiation Dose Management for Patients in Medical Imaging- Avinesh Shrestha
04 dr. Rahajeng - dr.rahajeng-KOGI XIX 2025-ed1.pdf
B C German Homoeopathy Medicineby Dr Brij Mohan Prasad
Copy of OB - Exam #2 Study Guide. pdf
Wheat allergies and Disease in gastroenterology
nephrology MRCP - Member of Royal College of Physicians ppt
OSCE Series Set 1 ( Questions & Answers ).pdf
Neoplasia III.pptxjhghgjhfj fjfhgfgdfdfsrbvhv
Manage HIV exposed child and a child with HIV infection.pptx
Lecture on Anesthesia for ENT surgery 2025pptx.pdf
focused on the development and application of glycoHILIC, pepHILIC, and comm...
The_EHRA_Book_of_Interventional Electrophysiology.pdf
Ad

2011 Big Data - Bigger Problems for Drug Discovery and Development

  • 1. Bio-IT Pek Yee Lum, Ph.D. VP Life Sciences 1
  • 2. Today’s Topics What are the problems we face with big data today in the drug discovery and development world? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Unveiling our NGS pipeline and genome browser Summary © 2012 Ayasdi inc. 2
  • 3. Today’s Topics What are the problems we face with big data today in the drug discovery and development world? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Unveiling our NGS pipeline and genome browser Summary © 2012 Ayasdi inc. 3
  • 4. Big data, bigger problems for drug discovery and development • Ever-growing complex and disparate datasets • Scalability issues • NGS raw data sometimes as large as 1TB per sample • Accessing data no longer simple for the untrained users • New IT infrastructure for every new problem • Every new data type needs custom tools • User experience does not exist today • Bioinformatics tools are not integrated • Analysis and visualization is disparate • Accelerating the discovery process requires rethinking the analysis workflow and streamlining its computational infrastructure Problems in drug discovery and development © 2012 Ayasdi inc. 4
  • 5. Data analysis landscape today R Cytoscape Database Spotfire Matlab Math Writing code Cloud Problems in drug discovery and development © 2012 Ayasdi inc. 5
  • 6. Biological Complexity on top of data problems Diseases are often complex. Many components work in synergy for disease manifestation. The need to identify perhaps not a single drug target but multiple targets that work as a network Human population is heterogenous- drugs fail because of the inability to stratify the patient population The need to identify biomarkers that work for patient stratification to decrease risk of adverse events and lack of efficacy Problems in drug discovery and development © 2012 Ayasdi inc. 6
  • 7. Today’s Topics What are the problems we face with big data today in the drug discovery and development world? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Unveiling our NGS pipeline and genome browser Summary © 2012 Ayasdi inc. 7
  • 8. Ayasdi Iris increases probability of success (POS) and shrinks time to market Discovery of subtle patterns in a sea of noisy data Handling of all data- large or small on the cloud Fusing disparate data sets with ease Access to critical public data on demand Allows collaboration for all types of stakeholders on one platform The Ayasdi solution © 2012 Ayasdi inc. 8
  • 9. Today’s Topics What are the problems we face with big data today in the drug discovery and development world? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Unveiling our NGS pipeline and genome browser Summary © 2012 Ayasdi inc. 9
  • 10. Solution Math+CS+UX platform Data has Shape and Shape has Meaning Data has shape © 2012 Ayasdi inc. 10
  • 11. What is shape ? If Age, Weight and Height were In reality, age, weight and height are distributed randomly correlated and that data has a shape Data has shape © 2012 Ayasdi inc. 11
  • 12. What is shape ? Ayasdi Iris identifies the shape or pattern in data Data has shape © 2012 Ayasdi inc. 12
  • 13. Why Topological Data Analysis for drug discovery and development 1. Coordinate free representations are vital when studying data collected using different technologies- lots of public data available, many studies done at different times, different data types collected 2. Deformation invariance introduces robustness into the analysis, which is important in the study of real world data- biological heterogeneity is complex and needs an approach that is deformation (variation) resistant 3. Compressed representations are obviously important when one is dealing with very large data sets- with high dimensional omics data and Next Gen sequencing getting more affordable, the amount of data is increasing exponentially Why topology ? © 2012 Ayasdi inc. 13
  • 14. Ayasdi Iris Uses principles of geometry to find shape (pattern) in data Works across and for any type of data Works with any amount of data Generates and validates hypotheses Quick, interactive results Why topology ? © 2012 Ayasdi inc. 14
  • 15. Patient stratification - Results gene expression profiling of breast tumors Identified a sub-group of patients that are triple negative with very good prognosis Identified a sub-group of patients that are Luminal A but with perfect survival (published PNAS 2011) These groups were identified in independent datasets These sub-groups were hard to find using conventional methods Patient stratification © 2012 Ayasdi inc. 15
  • 16. Today’s Topics What are the problems we face with big data today in the drug discovery and development world? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Unveiling our NGS pipeline and genome browser Summary © 2012 Ayasdi inc. 16
  • 17. Each node contains subsets of patients These patients are eccentric (away from the center of the data) These patients are close to the center of the data Color scheme Topological Map of Patient-Patient Relationships according to the molecular characteristics of their tumors (in this case, gene expression) Patient stratification © 2012 Ayasdi inc. 17
  • 18. Zero event death Low event death High event death High event death Low event death Mixed event death These patients are eccentric (away from the center of the data) Zero event death Color scheme Color scheme Patient stratification Zero event death © 2012 Ayasdi inc. 18
  • 19. patients did not survive 10 yrs B D E A Triple Negative HER2-, ESR-, PGR- C patients survived 10 yrs Patient stratification © 2012 Ayasdi inc. 19
  • 20. NKI data high ESR1 death low ESR1 survived high ESR1 low ESR1 Topological maps from GSE2304 two independent cancer data sets are very relapsed low ESR1 similar high ESR1 relapse no high ESR1 low ESR1 © 2012 Ayasdi inc. 20
  • 21. Conventional Methods subtypes difficult to identify Clustering PCA ER- did not survive ER- survived Patient stratification © 2012 Ayasdi inc. 21
  • 22. Next Generation Sequencing unveiling our solution DNA-Seq Pipeline BAMs VCF IRIS RNA-Seq Pipeline BAMs cuffcomp/RPKM IRIS 22
  • 23. 1000 genomes DNA-seq data population classification Africans East Asians Caucasians 23
  • 24. Identification and visualization of significant exon variants using PCA Han Chinese and Japanese cannot be easily distinguished using PCA Han Chinese (grey) Japanese (red) 24
  • 25. Identification and visualization of significant variants Han Chinese and Japanese can be easily distinguished and visualized with TDA Han Chinese rs2294008 TT chr8 Japanese Poster #3: Navigating Next Generation Sequencing Data using Topological Data Analysis and Iris 25
  • 26. Today’s Agenda What are the problems we face in drug discovery and development? What problems can Ayasdi solve for you? Why topology? Patient stratification using topology Summary © 2012 Ayasdi inc. 26
  • 27. Ayasdi Iris platform Ayasdi Iris Life Sciences Edition All analyses performed on the cloud on secure servers PubMed GO Bypass the need to invest in expensive hardware and KEGG PPI database administration GEO Interactive Network Visualization Integrated Statistics Algorithms Integrated Public Datasets Flexibility to start your analysis immediately- just upload your data Ayasdi Iris Cloud Platform Access to public data on demand Analysis and Visualization Topological Network Projections Network Histograms/ Dendro- Mapping Analysis (e.g. PCA) Visualization Scatterplots grams Scalable Distributed Datastore Proprietary and Public data sources Gene mRNA SNP Clinical NGS PubMed Expression Ayasdi Iris platform © 2012 Ayasdi inc. 27
  • 28. Ayasdi Iris increases probability of success (POS) and shrinks time to market Discovery of subtle patterns in a sea of noisy data Handling of all data- large or small on the cloud Fusing disparate data sets with ease Access to critical public data on demand Allows collaboration for all types of stakeholders on one platform The Ayasdi solution © 2012 Ayasdi inc. 28
  • 29. Contact us for more information www.ayasdi.com pek@ayasdi.com info@ayasdi.com 29