SlideShare a Scribd company logo
ML & AI in Drug
development: the
hidden part of the
iceberg
Paul Agapow
Oncology R&D
July 2021
Public
Disclosure
• No conflicts of interest
• Does not reflect official AZ thought or projects
• Based on experience in current & previous positions
• ML&AI / Health Informatics @AZ
• Data Science Institute @ICL
• Bioinformatics @Health Protection Agency (UK) …
2
What is drug development, how does it work?
Agenda
3
Why ML & AI is difficult in pharma
Where ML & AI can be powerful in pharma and
what we need to do
1
2
3
How we make drugs
1
Clinical trials
Identifying and
understanding disease,
unravelling the
molecular machinery,
pinpointing targets
Drug development is a long & complex process
5
Pathophysiology
Developing molecules
that can be
synthesized and
delivered safely to the
target
Drug candidates
Testing & optimizing
trials, dissecting
failures and successes,
tracking adverse
events, seeking
regulatory approval
Who gets the drug,
how is it re-imbursed,
tracking long-term
adverse events
Post-approval
6
• ~ $2B and 10 years to
develop & launch a drug
• The “valley of death”: most
candidate drugs will fail
The tough maths of drug development
ePharmacology.hubpages.com
Why ML & AI is
difficult in pharma
2
10 June 2021
8
“AI will not replace
drug hunters, but drug
hunters who don’t use
AI will be replaced by
those who do.”
-Andrew Hopkins, CEO Exscientia
9
12 July 2021
10
The complexity of biomedicine:
About 50 trillion cells of 200 types
Each cell has 23 pairs of chromosomes
In total 6.4 billion basepairs (positions)
Organised into about 18,000 genes
(Or maybe more like 40,000 genes)
Genetic material elsewhere in the cell
Epigenetic modification
1 million different types of molecules
Lifestyle & history
Exposure & environment
Immune system repertoire & priming
…
Of which we know only a fraction
Why? • Biology is outrageously complex
• Data is frequently biased, irregular, incomplete, in
different formats
• Biomedicine is a label desert
• As a consequence:
• Advances are throttled by domain knowledge
• How to represent & analyse complex domain
• Suitable data is often scarce
11
The classic
analytical
tension
12
What we need to solve
What we tend to solve
Easy things
Available, ideal data
Ground truth
Simplify
“Interesting”
“Table-land”
Useful things
Incomplete messy data
Unclear biological reality
Uncertain findings
Needful
“Network-land”
Where can ML & AI be
powerful in pharma?
3
14
Radiology & imaging widely used in healthcare
• Capture important & difficult to abstract data
–E.g. presence, size, shape of tumor
• Radiologists
–Never enough of them
–Rushed
–Frequently wrong
• But AI is good at interpreting images …
15
Not just X-rays & MRI but microscopes
Cancers are associated with certain proteins, which traditionally have to be
stained & examined visually. Deep learning can automatically do this for us
Slide stained for PD-L1 expression Cells that were automatically detected using AI
16
Precision medicine: subtypes of diseases & patients
Type 2 Diabetes
Topology based Patient-Patient network, identify distinct
subtypes of T2D
Dudley et al. Sci. transl. Med, 2015
COPD
• Transform patients into sequences of diagnosis codes
• Look for over-represented temporal pairs of codes
• Collapse pairs into trajectories of diagnoses
• Combine similar trajectories with graph similarity
Brunak et al. Nature Coms. 2016
10 June 2021
17
• A lot of biomedical
knowledge is associative or
relational & multimodal
• Knowledge graphs /
GCNs help us to capture
and analysis
• Have been used to propose
new drugs and patient
subtypes
Software as a Medical Device
18
• For:
• Diagnosis
• Monitoring
• Interpretation
• Prognosis
• Priorotorization …
• Tough regulatory
environment
19
How to manage and interoperate?
Target ID
Target
Validation
Discovery Pre-Clinical Clinical Commercial
Post Marketing
Surveillance
Genetic &
Genomic Data
Patient-Centric
Data
Sensors &
Smart Devices
Interactive
Media
Healthcare Information
network
Market
Data
We need more data
Interpretability (etc.) is vital
• But what actually is interpretability?
• May feedback to inspire mechanistic research, but …
• Essential for:
• a smoke test, validation
• check for bias
• communication
• Likewise calibration
• Important to understand how (un)sure we are
20
Good (engineering) practices & production quality is vital
21
Takeaways
Drug
development is
a enormously
complex
process
Although
attractive, ML
& AI are often
hindered by the
nature of the
data
Areas of
definite value
include
subtyping,
imaging &
knowledge
graphs
Gathering &
more and wider
data and
better
engineering is
key to further
progress
22
Click to enter
title here
Some light
reading
23
Academic Press (2021)
Editor: Ashenden
Looking for
work?
• If you are driven by science and passioned about
improving lives, why not look at a job in pharma?
• Example jobs at AstraZeneca - visit our careers
website for much, much more:
• Principal Data Scientist
• Associate Director Imaging & AI - Imaging & Data
Analytics
• Knowledge Graph Engineer
• Data Sciences & AI Graduate Programme
24
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
25

More Related Content

PDF
Multi-omics for drug discovery: what we lose, what we gain
PDF
Machine learning, health data & the limits of knowledge
PPTX
Medical data diagnosis
PDF
Filling the gaps in translational research
PPTX
The End of the Drug Development Casino?
PDF
Unifying Genomics, Phenomics, and Environments
PPTX
ML & AI in pharma: an overview
PDF
Machine Learning for Preclinical Research
Multi-omics for drug discovery: what we lose, what we gain
Machine learning, health data & the limits of knowledge
Medical data diagnosis
Filling the gaps in translational research
The End of the Drug Development Casino?
Unifying Genomics, Phenomics, and Environments
ML & AI in pharma: an overview
Machine Learning for Preclinical Research

What's hot (20)

PDF
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
PDF
Dichotomania and other challenges for the collaborating biostatistician
PDF
Make clinical prediction models great again
PPTX
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
PDF
Machine learning in medicine: calm down
PDF
Bias in covid 19 models
PDF
Clinical prediction models
PDF
Thoughts on Machine Learning and Artificial Intelligence
PPTX
How to establish and evaluate clinical prediction models - Statswork
PPTX
Calibration of risk prediction models: decision making with the lights on or ...
PDF
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
PDF
Clinical prediction models: development, validation and beyond
PDF
Introduction to prediction modelling - Berlin 2018 - Part II
PPTX
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
PDF
Data Science Deep Roots in Healthcare Industry
PDF
Machine Learning for automatic diagnosis: why your deep neural network might ...
PPT
Dochelp-An artificially intelligent medical diagnosis system
PPTX
Statistical Review of Basic Science Manuscripts at Osteoarthritis and Cartila...
PDF
Development and evaluation of prediction models: pitfalls and solutions (Part...
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Dichotomania and other challenges for the collaborating biostatistician
Make clinical prediction models great again
Big Data Provides Opportunities, Challenges and a Better Future in Health and...
Machine learning in medicine: calm down
Bias in covid 19 models
Clinical prediction models
Thoughts on Machine Learning and Artificial Intelligence
How to establish and evaluate clinical prediction models - Statswork
Calibration of risk prediction models: decision making with the lights on or ...
Prof Mendel Singer Big Data Meets Public Health and Medicine 2018 12-22
Clinical prediction models: development, validation and beyond
Introduction to prediction modelling - Berlin 2018 - Part II
Str-AI-ght to heaven? Pitfalls for clinical decision support based on AI
Data Science Deep Roots in Healthcare Industry
Machine Learning for automatic diagnosis: why your deep neural network might ...
Dochelp-An artificially intelligent medical diagnosis system
Statistical Review of Basic Science Manuscripts at Osteoarthritis and Cartila...
Development and evaluation of prediction models: pitfalls and solutions (Part...
Ad

Similar to ML & AI in Drug development: the hidden part of the iceberg (20)

PDF
Where AI will (and won't) revolutionize biomedicine
PDF
AI in Healthcare
PDF
Beyond Proofs of Concept for Biomedical AI
PDF
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
PPTX
ai-in-healthcare-202011-201117103639.pptx
PPTX
ML, biomedical data & trust
PPTX
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
PPTX
Artificial Intelligence in Drug Discovery
PDF
Medical Informatics: Computational Analytics in Healthcare
PPTX
AI presentation in radiology protocols.pptx
PPTX
Atul Butte NIPS 2017 ML4H
PPTX
Fighting Neurodegenerative Diseases
PDF
Machine learning applied in health
PPTX
Nlp for the precision medicine
PPTX
Artificial Intilligence in Mediicne by Dr.t.V.Rao MD
PPTX
Artificial intelligence and Medicine.pptx
PDF
AI in pharma & biotech: possibilities and realities
PPTX
Interpreting Complex Real World Data for Pharmaceutical Research
PPTX
Drug discovery using ai
PDF
Technology for Family Physicians- India
Where AI will (and won't) revolutionize biomedicine
AI in Healthcare
Beyond Proofs of Concept for Biomedical AI
DeciBio Perspectives on Pain Points, Unmet Needs, and Disruption in Precision...
ai-in-healthcare-202011-201117103639.pptx
ML, biomedical data & trust
Cutting Edge Conversations: Addressing Orphan and Rare Diseases
Artificial Intelligence in Drug Discovery
Medical Informatics: Computational Analytics in Healthcare
AI presentation in radiology protocols.pptx
Atul Butte NIPS 2017 ML4H
Fighting Neurodegenerative Diseases
Machine learning applied in health
Nlp for the precision medicine
Artificial Intilligence in Mediicne by Dr.t.V.Rao MD
Artificial intelligence and Medicine.pptx
AI in pharma & biotech: possibilities and realities
Interpreting Complex Real World Data for Pharmaceutical Research
Drug discovery using ai
Technology for Family Physicians- India
Ad

More from Paul Agapow (17)

PDF
Clinical studies & observational trials in the age of AI
PDF
Opportunities for AI in drug development 202412.pdf
PDF
Career advice for new bio-(x)-ists, Dec2024.pdf
PDF
Can drug repurposing be saved with AI 202405.pdf
PDF
IA, la clave de la genomica (May 2024).pdf
PDF
Digital Biomarkers, a (too) brief introduction.pdf
PDF
How to make every mistake and still have a career, Feb2024.pdf
PDF
Get yourself a better bioinformatics job
PPTX
Bioinformatics! (What is it good for?)
PPTX
Big Data & ML for Clinical Data
PDF
AI for Precision Medicine (Pragmatic preclinical data science)
PDF
Patient subtypes: real or not?
PDF
Big biomedical data is a lie
PDF
eTRIKS at Pharma IT 2017, London
PDF
Introduction to Snakemake
PPTX
Analysing biomedical data (ers october 2017)
PPTX
Interpreting transcriptomics (ers berlin 2017)
Clinical studies & observational trials in the age of AI
Opportunities for AI in drug development 202412.pdf
Career advice for new bio-(x)-ists, Dec2024.pdf
Can drug repurposing be saved with AI 202405.pdf
IA, la clave de la genomica (May 2024).pdf
Digital Biomarkers, a (too) brief introduction.pdf
How to make every mistake and still have a career, Feb2024.pdf
Get yourself a better bioinformatics job
Bioinformatics! (What is it good for?)
Big Data & ML for Clinical Data
AI for Precision Medicine (Pragmatic preclinical data science)
Patient subtypes: real or not?
Big biomedical data is a lie
eTRIKS at Pharma IT 2017, London
Introduction to Snakemake
Analysing biomedical data (ers october 2017)
Interpreting transcriptomics (ers berlin 2017)

Recently uploaded (20)

PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PPT
Management of Acute Kidney Injury at LAUTECH
PPT
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
PPTX
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
PPTX
Uterus anatomy embryology, and clinical aspects
PPT
Obstructive sleep apnea in orthodontics treatment
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPTX
neonatal infection(7392992y282939y5.pptx
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
Neuropathic pain.ppt treatment managment
PPTX
Gastroschisis- Clinical Overview 18112311
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
ACID BASE management, base deficit correction
PPTX
Respiratory drugs, drugs acting on the respi system
PPT
ASRH Presentation for students and teachers 2770633.ppt
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Management of Acute Kidney Injury at LAUTECH
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
Uterus anatomy embryology, and clinical aspects
Obstructive sleep apnea in orthodontics treatment
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
neonatal infection(7392992y282939y5.pptx
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
Neuropathic pain.ppt treatment managment
Gastroschisis- Clinical Overview 18112311
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
surgery guide for USMLE step 2-part 1.pptx
ACID BASE management, base deficit correction
Respiratory drugs, drugs acting on the respi system
ASRH Presentation for students and teachers 2770633.ppt
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified

ML & AI in Drug development: the hidden part of the iceberg

  • 1. ML & AI in Drug development: the hidden part of the iceberg Paul Agapow Oncology R&D July 2021 Public
  • 2. Disclosure • No conflicts of interest • Does not reflect official AZ thought or projects • Based on experience in current & previous positions • ML&AI / Health Informatics @AZ • Data Science Institute @ICL • Bioinformatics @Health Protection Agency (UK) … 2
  • 3. What is drug development, how does it work? Agenda 3 Why ML & AI is difficult in pharma Where ML & AI can be powerful in pharma and what we need to do 1 2 3
  • 4. How we make drugs 1
  • 5. Clinical trials Identifying and understanding disease, unravelling the molecular machinery, pinpointing targets Drug development is a long & complex process 5 Pathophysiology Developing molecules that can be synthesized and delivered safely to the target Drug candidates Testing & optimizing trials, dissecting failures and successes, tracking adverse events, seeking regulatory approval Who gets the drug, how is it re-imbursed, tracking long-term adverse events Post-approval
  • 6. 6 • ~ $2B and 10 years to develop & launch a drug • The “valley of death”: most candidate drugs will fail The tough maths of drug development ePharmacology.hubpages.com
  • 7. Why ML & AI is difficult in pharma 2
  • 8. 10 June 2021 8 “AI will not replace drug hunters, but drug hunters who don’t use AI will be replaced by those who do.” -Andrew Hopkins, CEO Exscientia
  • 9. 9
  • 10. 12 July 2021 10 The complexity of biomedicine: About 50 trillion cells of 200 types Each cell has 23 pairs of chromosomes In total 6.4 billion basepairs (positions) Organised into about 18,000 genes (Or maybe more like 40,000 genes) Genetic material elsewhere in the cell Epigenetic modification 1 million different types of molecules Lifestyle & history Exposure & environment Immune system repertoire & priming … Of which we know only a fraction
  • 11. Why? • Biology is outrageously complex • Data is frequently biased, irregular, incomplete, in different formats • Biomedicine is a label desert • As a consequence: • Advances are throttled by domain knowledge • How to represent & analyse complex domain • Suitable data is often scarce 11
  • 12. The classic analytical tension 12 What we need to solve What we tend to solve Easy things Available, ideal data Ground truth Simplify “Interesting” “Table-land” Useful things Incomplete messy data Unclear biological reality Uncertain findings Needful “Network-land”
  • 13. Where can ML & AI be powerful in pharma? 3
  • 14. 14 Radiology & imaging widely used in healthcare • Capture important & difficult to abstract data –E.g. presence, size, shape of tumor • Radiologists –Never enough of them –Rushed –Frequently wrong • But AI is good at interpreting images …
  • 15. 15 Not just X-rays & MRI but microscopes Cancers are associated with certain proteins, which traditionally have to be stained & examined visually. Deep learning can automatically do this for us Slide stained for PD-L1 expression Cells that were automatically detected using AI
  • 16. 16 Precision medicine: subtypes of diseases & patients Type 2 Diabetes Topology based Patient-Patient network, identify distinct subtypes of T2D Dudley et al. Sci. transl. Med, 2015 COPD • Transform patients into sequences of diagnosis codes • Look for over-represented temporal pairs of codes • Collapse pairs into trajectories of diagnoses • Combine similar trajectories with graph similarity Brunak et al. Nature Coms. 2016
  • 17. 10 June 2021 17 • A lot of biomedical knowledge is associative or relational & multimodal • Knowledge graphs / GCNs help us to capture and analysis • Have been used to propose new drugs and patient subtypes
  • 18. Software as a Medical Device 18 • For: • Diagnosis • Monitoring • Interpretation • Prognosis • Priorotorization … • Tough regulatory environment
  • 19. 19 How to manage and interoperate? Target ID Target Validation Discovery Pre-Clinical Clinical Commercial Post Marketing Surveillance Genetic & Genomic Data Patient-Centric Data Sensors & Smart Devices Interactive Media Healthcare Information network Market Data We need more data
  • 20. Interpretability (etc.) is vital • But what actually is interpretability? • May feedback to inspire mechanistic research, but … • Essential for: • a smoke test, validation • check for bias • communication • Likewise calibration • Important to understand how (un)sure we are 20
  • 21. Good (engineering) practices & production quality is vital 21
  • 22. Takeaways Drug development is a enormously complex process Although attractive, ML & AI are often hindered by the nature of the data Areas of definite value include subtyping, imaging & knowledge graphs Gathering & more and wider data and better engineering is key to further progress 22
  • 23. Click to enter title here Some light reading 23 Academic Press (2021) Editor: Ashenden
  • 24. Looking for work? • If you are driven by science and passioned about improving lives, why not look at a job in pharma? • Example jobs at AstraZeneca - visit our careers website for much, much more: • Principal Data Scientist • Associate Director Imaging & AI - Imaging & Data Analytics • Knowledge Graph Engineer • Data Sciences & AI Graduate Programme 24
  • 25. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com 25