SlideShare a Scribd company logo
From data to insights and
action: Strategies to take
your bioinformatics to the
next level
Eleanor Howe, Diamond Age Data Science
Huseyin Mehmet, Zafgen, Inc.
December 7, 2018
What is this talk about?
• Who are we? What is computational biology?
• Lessons learned from working with our customers
• Our ongoing relationship with Zafgen
• Q&A
Eleanor Howe, PhD
Background in molecular biology, statistics,
programming and computational
biology/bioinformatics
eleanor@diamondage.com
Diamond Age Data Science
www.diamondage.com
Bioinformatics/computational biology consulting
Project-based analysis
Staff augmentation
Pipeline development
“Drop-in” bioinformatics department
The Diamond Age: or,
A Young Lady’s Illustrated Primer
by Neal Stephenson
Team
Chris Friedline
Sequencing,
software engineering
Somdutta Saha
Computational chemistry and
proteomics
Bruce Romano
Mathematics and data science
Nicholas Crawford
Human genetics and GWAS
Mike DeRan
Cancer and diabetes
therapeutics, scRNA-seq
Max Marin
RNA splicing
Zarko Boskovic
Medicinal chemistry and
metabolomics
Chris Dwan
IT and data security
A few of our clients
Computational Biology
Computational biology is data
science for biology
Bioinformatics is sometimes a
synonym for computational
biology.
Other times, bioinformatics refers
to software engineering for
biology.
Lessons learned
Drug discovery requires evaluation of
diverse, complex data
• Sequence analysis is very different
from proteomics
• Knowing the landscape of available
datasets is key
• Individual bioinformaticians tend to
specialize in one sub-field or
another
Public datasets are a gold mine
• Cancer Cell-line Encyclopedia
• The Cancer Genome Atlas
• Gene Expression Omnibus
• Dependencies Map (Dep-map)
• UK Biobank
• DrugBank
• VarSome
• GTeX
But the real gems come from your own
experiments
It’s not possible to validate a drug
target using public datasets alone.
The public datasets are general, and
cover only the most common
diseases or disease subtypes.
The most useful results come from
combining custom-generated data
with public data.
CROs do the basics well
• Ocean Ridge, Novogene ($200 transcriptome!)
• Good for the basics - RNA-seq, DNA-seq, proteomics, metabolomics
• Reasonable standardized analysis pipelines
• Challenges:
• combining multiple datasets across experiments or across CROs
• more involved analysis (e.g. splicing)
• Do a thorough cost-comparison when considering an academic
collaborator
• Also ask them when their student is graduating.
What additional expertise do you need?
Early stage “traditional” therapeutics companies don’t need a full-time
computational biologist. Part time can work fine.
When the company expands, hire a computational biologist with
substantial experience, or an analyst with some kind of advisor
available.
Computational biologist:
Experience/training in all three
areas
Analyst: Biology + programming,
with an advisor to help with the
statistics
Methods developer: Wants to
build new analytical tools
Know what you need
What expertise do you need?
For Teams:
• Cross-discipline expertise
-biology, chemistry, computer science, statistics
• Communication skills
• Lateral thinking
Expertise gets you fast answers
The problem:
Get a terabyte of data from a USB
hard drive to the cloud in time to
analyze a dataset for a conference
Expertise gets you fast answers
The problem:
Get a terabyte of data from a USB
hard drive to the cloud in time to
analyze a dataset for a conference
The solution:
Bicycle across the Charles
3Gb/s bicycle (latency of 1.2M
ms)
Datacenter internet connection
Markley Data Center
Deep Learning / Artificial Intelligence
Another danger zone
Deep Learning / Artificial Intelligence
Deep learning is “new” in
that it’s a more complex
version of older
technology: a neural
network
Modern compute power
allows for powerful
classifiers trained on very
large datasets
The basics of machine learning (and DL)
Deep Learning works in a
similar way to other types
of machine learning.
The algorithms use larger
datasets and are more
complex. But the overall
workflow is the same.
Should you use deep learning?
Is your training data:
Large. 100,000+ to 1M+
samples
Well-annotated. Gene
expression data usually isn’t.
Representative of the
questions you want to answer?
In discovery biology, the data is
usually not there. Hence “discovery”.
Good use-cases for deep learning
Image processing
Diagnostics from histology,
radiology
High-content screening
Biochemical structure/sequence
Epitope prediction
Protein folding (Deep Mind)
Single-cell RNA-seq (potentially)
Should you use deep learning? (cont)
Do you need an interpretable model?
Deep learning is a black box
Have you tried everything else?
Linear models, random
forests, other ML techniques
These tools are often faster, cheaper,
and easier to understand and
implement
Using Bioinformatics Data to inform Therapeutics discovery and development
Huseyin Mehmet, PhD
Vice President and Head of Discovery Research
Zafgen, Inc.
Zafgen, Inc
• Publicly traded bio-pharmaceutical company
• Founded 12 years ago (IPO in 2014)
• Virtual company
• Bringing MetAP2 inhibitors to market
• Areas of interest: Metabolic disease
Zafgen and Diamond Age
Diamond Age acts as a virtual bioinformatics
department for Zafgen
• Data Analysis
• Data Management
• Hypothesis generation
• Technology recommendations
What Diamond Age has done for Zafgen
• Transcriptional profiling
• Proteomics/phosphoproteomics
• Metabolomics
• Clinical outcomes
• Custom apps for client needs
The benefits
What can Zafgen can do now that it couldn’t before?
• Iterative data generation
• Cross-dataset analyses
• Confidence in analysis results from CROs
• Link between pre-clinical and clinical data
• Cost efficiencies / value for money
Thank you!
Questions?
Using Bioinformatics Data to inform Therapeutics discovery and development

More Related Content

PDF
Deep learning in medicine: An introduction and applications to next-generatio...
PPTX
2016 bergen-sars
PPTX
Bioinformatics workflows and study design
PDF
BIOMAG2018 - Denis Engemann - MNE-HCP
PDF
Considerations and challenges in building an end to-end microbiome workflow
PPTX
Advancing Foundation and Practice of Software Analytics
PPT
Berlin 6 Open Access Conference: Jelena Kovacevic
PPTX
AI is the Future of Drug Discovery
Deep learning in medicine: An introduction and applications to next-generatio...
2016 bergen-sars
Bioinformatics workflows and study design
BIOMAG2018 - Denis Engemann - MNE-HCP
Considerations and challenges in building an end to-end microbiome workflow
Advancing Foundation and Practice of Software Analytics
Berlin 6 Open Access Conference: Jelena Kovacevic
AI is the Future of Drug Discovery

What's hot (19)

PPTX
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
PPTX
The current state of prediction in neuroimaging
PPT
kantorNSF-NIJ-ISI-03-06-04.ppt
PPTX
Jillian ms defense-4-14-14-ja
PPTX
Towards automated phenotypic cell profiling with high-content imaging
PDF
MedChemica BigData What Is That All About?
PPTX
Practical Drug Discovery using Explainable Artificial Intelligence
PDF
My experiment
PDF
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
PDF
The ELIXIR UK industry survey by Gabriella Rustici
PPTX
Watson Computer
PDF
Biological Foundations for Deep Learning: Towards Decision Networks
PDF
Zebrafish and Data Management Final Project
PPTX
Educating the Scientific Brain and Mind: Insights from The Science of Learnin...
PPTX
Analogy, Causality, and Discovery in Science: The engines of human thought
PDF
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
PPTX
MedChemica Active Learning - Combining MMPA and ML
PDF
Validating microbiome claims – including the latest DNA techniques
PPTX
Software Testing
In Silico Approaches for Predicting Hazards from Chemical Structure and Exist...
The current state of prediction in neuroimaging
kantorNSF-NIJ-ISI-03-06-04.ppt
Jillian ms defense-4-14-14-ja
Towards automated phenotypic cell profiling with high-content imaging
MedChemica BigData What Is That All About?
Practical Drug Discovery using Explainable Artificial Intelligence
My experiment
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
The ELIXIR UK industry survey by Gabriella Rustici
Watson Computer
Biological Foundations for Deep Learning: Towards Decision Networks
Zebrafish and Data Management Final Project
Educating the Scientific Brain and Mind: Insights from The Science of Learnin...
Analogy, Causality, and Discovery in Science: The engines of human thought
Publish or Perish: Questioning the Impact of Our Research on the Software Dev...
MedChemica Active Learning - Combining MMPA and ML
Validating microbiome claims – including the latest DNA techniques
Software Testing
Ad

Similar to Using Bioinformatics Data to inform Therapeutics discovery and development (20)

PDF
Computational Learning Approaches to Data Analytics in Biomedical Application...
PPTX
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
PPTX
The End of the Drug Development Casino?
PPTX
Introduction to Data Science
PDF
Computational Learning Approaches to Data Analytics in Biomedical Application...
PDF
Introduction to Bioinformatics.
PPTX
Overview of Data Science and AI
PDF
Building successful data science teams
PDF
Zen and the Art of Data Science Maintenance
PPTX
Big Data & ML for Clinical Data
PPTX
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
PDF
HealthXL: How Artificial Intelligence (AI) Can Improve Research & Care Models...
PDF
Making an impact with data science
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PPTX
Big Data, AI, and Pharma
PPTX
Data_Science_Applications_&_Use_Cases.pptx
PDF
Deep learning for biomedical discovery and data mining I
PDF
So you want to be a Data Scientist?
PDF
AI for Marking Industry application for.pdf
PDF
Data_Science_Applications_&_Use_Cases.pdf
Computational Learning Approaches to Data Analytics in Biomedical Application...
AIMed 19 Workshop 1: Machine Learning for non-data scientist by Dr. Robert Hoyt
The End of the Drug Development Casino?
Introduction to Data Science
Computational Learning Approaches to Data Analytics in Biomedical Application...
Introduction to Bioinformatics.
Overview of Data Science and AI
Building successful data science teams
Zen and the Art of Data Science Maintenance
Big Data & ML for Clinical Data
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
HealthXL: How Artificial Intelligence (AI) Can Improve Research & Care Models...
Making an impact with data science
Data_Science_Applications_&_Use_Cases.pptx
Big Data, AI, and Pharma
Data_Science_Applications_&_Use_Cases.pptx
Deep learning for biomedical discovery and data mining I
So you want to be a Data Scientist?
AI for Marking Industry application for.pdf
Data_Science_Applications_&_Use_Cases.pdf
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Database Infoormation System (DBIS).pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Business Acumen Training GuidePresentation.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Knowledge Engineering Part 1
Reliability_Chapter_ presentation 1221.5784
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Database Infoormation System (DBIS).pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx

Using Bioinformatics Data to inform Therapeutics discovery and development

  • 1. From data to insights and action: Strategies to take your bioinformatics to the next level Eleanor Howe, Diamond Age Data Science Huseyin Mehmet, Zafgen, Inc. December 7, 2018
  • 2. What is this talk about? • Who are we? What is computational biology? • Lessons learned from working with our customers • Our ongoing relationship with Zafgen • Q&A
  • 3. Eleanor Howe, PhD Background in molecular biology, statistics, programming and computational biology/bioinformatics eleanor@diamondage.com
  • 4. Diamond Age Data Science www.diamondage.com Bioinformatics/computational biology consulting Project-based analysis Staff augmentation Pipeline development “Drop-in” bioinformatics department The Diamond Age: or, A Young Lady’s Illustrated Primer by Neal Stephenson
  • 5. Team Chris Friedline Sequencing, software engineering Somdutta Saha Computational chemistry and proteomics Bruce Romano Mathematics and data science Nicholas Crawford Human genetics and GWAS Mike DeRan Cancer and diabetes therapeutics, scRNA-seq Max Marin RNA splicing Zarko Boskovic Medicinal chemistry and metabolomics Chris Dwan IT and data security
  • 6. A few of our clients
  • 7. Computational Biology Computational biology is data science for biology Bioinformatics is sometimes a synonym for computational biology. Other times, bioinformatics refers to software engineering for biology.
  • 9. Drug discovery requires evaluation of diverse, complex data • Sequence analysis is very different from proteomics • Knowing the landscape of available datasets is key • Individual bioinformaticians tend to specialize in one sub-field or another
  • 10. Public datasets are a gold mine • Cancer Cell-line Encyclopedia • The Cancer Genome Atlas • Gene Expression Omnibus • Dependencies Map (Dep-map) • UK Biobank • DrugBank • VarSome • GTeX
  • 11. But the real gems come from your own experiments It’s not possible to validate a drug target using public datasets alone. The public datasets are general, and cover only the most common diseases or disease subtypes. The most useful results come from combining custom-generated data with public data.
  • 12. CROs do the basics well • Ocean Ridge, Novogene ($200 transcriptome!) • Good for the basics - RNA-seq, DNA-seq, proteomics, metabolomics • Reasonable standardized analysis pipelines • Challenges: • combining multiple datasets across experiments or across CROs • more involved analysis (e.g. splicing) • Do a thorough cost-comparison when considering an academic collaborator • Also ask them when their student is graduating.
  • 13. What additional expertise do you need? Early stage “traditional” therapeutics companies don’t need a full-time computational biologist. Part time can work fine. When the company expands, hire a computational biologist with substantial experience, or an analyst with some kind of advisor available.
  • 14. Computational biologist: Experience/training in all three areas Analyst: Biology + programming, with an advisor to help with the statistics Methods developer: Wants to build new analytical tools Know what you need
  • 15. What expertise do you need? For Teams: • Cross-discipline expertise -biology, chemistry, computer science, statistics • Communication skills • Lateral thinking
  • 16. Expertise gets you fast answers The problem: Get a terabyte of data from a USB hard drive to the cloud in time to analyze a dataset for a conference
  • 17. Expertise gets you fast answers The problem: Get a terabyte of data from a USB hard drive to the cloud in time to analyze a dataset for a conference The solution: Bicycle across the Charles 3Gb/s bicycle (latency of 1.2M ms) Datacenter internet connection Markley Data Center
  • 18. Deep Learning / Artificial Intelligence Another danger zone
  • 19. Deep Learning / Artificial Intelligence Deep learning is “new” in that it’s a more complex version of older technology: a neural network Modern compute power allows for powerful classifiers trained on very large datasets
  • 20. The basics of machine learning (and DL) Deep Learning works in a similar way to other types of machine learning. The algorithms use larger datasets and are more complex. But the overall workflow is the same.
  • 21. Should you use deep learning? Is your training data: Large. 100,000+ to 1M+ samples Well-annotated. Gene expression data usually isn’t. Representative of the questions you want to answer? In discovery biology, the data is usually not there. Hence “discovery”.
  • 22. Good use-cases for deep learning Image processing Diagnostics from histology, radiology High-content screening Biochemical structure/sequence Epitope prediction Protein folding (Deep Mind) Single-cell RNA-seq (potentially)
  • 23. Should you use deep learning? (cont) Do you need an interpretable model? Deep learning is a black box Have you tried everything else? Linear models, random forests, other ML techniques These tools are often faster, cheaper, and easier to understand and implement
  • 25. Huseyin Mehmet, PhD Vice President and Head of Discovery Research Zafgen, Inc.
  • 26. Zafgen, Inc • Publicly traded bio-pharmaceutical company • Founded 12 years ago (IPO in 2014) • Virtual company • Bringing MetAP2 inhibitors to market • Areas of interest: Metabolic disease
  • 27. Zafgen and Diamond Age Diamond Age acts as a virtual bioinformatics department for Zafgen • Data Analysis • Data Management • Hypothesis generation • Technology recommendations
  • 28. What Diamond Age has done for Zafgen • Transcriptional profiling • Proteomics/phosphoproteomics • Metabolomics • Clinical outcomes • Custom apps for client needs
  • 29. The benefits What can Zafgen can do now that it couldn’t before? • Iterative data generation • Cross-dataset analyses • Confidence in analysis results from CROs • Link between pre-clinical and clinical data • Cost efficiencies / value for money