SlideShare a Scribd company logo
The promise of graphs &
graph-based learning in
drug discovery
Ufuk Kirik
Data Science & Quantitative Biology, Discovery Sciences, R&D
AstraZeneca Gothenburg, Sweden
2022-09-05
Outline
• Background
• Drug Discovery Landscape
• Graphs in Life Science (Research)
• The promise
• How we use them
• What have we learned so far
• The “so what?”
2
Background
3
Who am I?
Curious, nerdy kid
Engineer/Applied mathematician
Bioinformatician
Data scientist
Graph enthusiast
Background
4
Who am I? What do I do?
• Lead a team working with KG Insights
• Develop processes to enrich our graphs
with a wide variety of in-house
generated data
• Develop tools to effectively explore and
exploit our graphs, catering to a broad
range of users.
Curious, nerdy kid
Engineer/Applied mathematician
Bioinformatician
Data scientist
Graph enthusiast
Background
What do I want to achieve with this talk?
• Our journey into the graph world
• Our challenges and approach to graph-based learning
• Our learnings, generalized for wider audience
5
Who am I?
Background
What do I want to achieve with this talk?
• Our journey into the graph world
• Our challenges and approach to graph-based learning
• Our learnings, generalized for wider audience
• Will not include any graphics or specifics from our graphs, due to the
sensitive nature of our project.
6
Who am I?
Pre-clinical Clinical Follow-up
Drug discovery landscape
• What does it take to bring a new medicine to a patient?
7
Pre-clinical Clinical Follow-up
Drug discovery landscape
• What does it take to bring a new medicine to a patient?
8
Ph. I Ph. II Ph. III
(x10)
Drug discovery landscape
• What does it take to bring a new medicine to a patient?
9
Cook, D., Brown, D., Alexander, R. et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat Rev Drug
Discov 13, 419–431 (2014). https://doi.org/10.1038/nrd4309
Bottleneck: Phase II
Reason(s): Efficacy & Safety
Problem: Time & Money
Enter graphs…
What is so cool about graphs anyways?
• Connected information everywhere
• Graphs are a natural way to represent connected information
• Graph theory is old and rich, most importantly it is incredibly useful
12
What is so cool about graphs anyways?
• Connected information everywhere
• Graphs are a natural way to represent connected information
• Graph theory is old and rich, most importantly it is incredibly useful
13
What is so cool about graphs anyways?
• Connected information everywhere
• Graphs are a natural way to represent connected information
• Graph theory is old and rich, most importantly it is incredibly useful
14
What is so cool about graphs anyways?
• Connected information everywhere
• Graphs are a natural way to represent connected information
• Graph theory is old and rich, most importantly it is incredibly useful
• ML on graphs is extremely useful:
• Movie/Book/Product recommendations (e.g. Amazon, Netflix, Spotify)
• Personal/Professional network (e.g. social media)
• Like all things ML, sometimes it can go terribly wrong too!
15
Graphs in life sciences
• Nothing works in isolation in the biological domain!
16
Graphs in life sciences
• Nothing works in isolation in the biological domain!
• Crash course in cell biology:
• Central dogma DNA -> RNA -> Proteins -> Function
17
Graphs in life sciences
• Nothing works in isolation in the biological domain!
• Crash course in cell biology:
• Central dogma DNA -> RNA -> Proteins -> Function
• Pathways & Functional annotations
• Cells & Tissues ...
• Compounds & Diseases ...
18
Graphs in life sciences
• Nothing works in isolation in the biological domain!
19
Analogy - if it was all about people:
• genes would be the general identifiable information: name, address, maybe social
security number etc
[Ufuk Kirik, The data scientist, Lives in Gothenburg]
20
Analogy - if it was all about people:
• genes would be the general identifiable information: name, address, maybe social
security number etc
[Ufuk Kirik, The data scientist, Lives in Gothenburg]
• proteins would be the actual physical presence; what we do and how we look like
while doing that.
[The person standing in front of you, here I am talking, dressed smart casual]
21
Analogy - if it was all about people:
• genes would be the general identifiable information: name, address, maybe social
security number etc
[Ufuk Kirik, The data scientist, Lives in Gothenburg]
• proteins would be the actual physical presence of me; what we do and how we look
like while doing that.
[The person standing in front of you, here I am talking, dressed smart casual]
• What a gene/protein is and what it does varies on the context
• Not only the activity changes but also with whom I do these activities change based
on time and location.
22
Analogy - if it was all about people:
• If I do things I am not supposed to, or allowed to, then we end up having problems,
which is essentially what happens in the case of most diseases.
• Now that is where the analogy falls short, since although there is only 1 of me, proteins
can, and typically do, have many copies co-existing at the same.
• Nevertheless all of these interactions can be captured with a graph structure.
I would actually go as far to speculate that a graph is the only intuitive way to represent
information like this.
23
Biomedical knowledge graphs
24
Himmelstein, D. S., & Baranzini, S. E. (2015). Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-
Associated Genes. PLoS computational biology, 11(7), e1004259. https://doi.org/10.1371/journal.pcbi.1004259
How do we deploy graphs?
• We refer to graphs in the context of knowledge graphs, where we aggregate data from
various different sources; public and internal, conceptual and experimental alike.
25
Graph composition
26
Graph Composition
• These data sources can then be transformed into a knowledge
graph through a composition stage.
• Unfortunately, many biases can be present during this phase.
How do we deploy graphs?
• We refer to graphs in the context of knowledge graphs, where we aggregate data from
various different sources; public and internal, conceptual and experimental alike.
• We use these graphs for both predictive and explorative purposes
27
How do we deploy graphs?
• We refer to graphs in the context of knowledge graphs, where we aggregate data from
various different sources; public and internal, conceptual and experimental alike.
• We use these graphs for both predictive and explorative purposes
• Predictive examples:
• which gene could be a good drug target for a given disease?
• which compound could be a good/useful treatment for a given disease?
28
How do we deploy graphs?
• We refer to graphs in the context of knowledge graphs, where we aggregate data from
various different sources; public and internal, conceptual and experimental alike.
• We use these graphs for both predictive and explorative purposes
• Predictive examples:
• which gene could be a good drug target for a given disease?
• which compound could be a good/useful treatment for a given disease?
• Explorative examples:
• what do we know about gene X (in the context of disease D)?
• what level of evidence do we have that supports/contradicts a putative use of compound C for a new
indication (that is a new disease)?
29
Example
• (Find all/some genes that associates with Parkinson's | Parkinson's -> associates -> ?).
30
Example
• Find all/some genes that associates with Parkinson's | Parkinson's -> associates -> ?.
• What is the likelihood of an associates edge between HP and Parkinson’s Disease?
31
Biomedical knowledge graphs
32
Entity
Embedding
matrix
Relation
Embedding
matrix
KGEM
Embeddings are learned for each
entity (gene, disease, compound,
etc) and relation in the graph.
The embedding dimension typically
falls with the range of 32-1024.
Himmelstein, D. S., & Baranzini, S. E. (2015). Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-
Associated Genes. PLoS computational biology, 11(7), e1004259. https://doi.org/10.1371/journal.pcbi.1004259
33
Summary: KG use for target prediction
The graph: constituent data + composition
ML model
Ranking targets
Human review
Experimental validation
34
Summary: KG use for target prediction
The graph: constituent data + composition
ML model
Ranking targets
Human review
Experimental validation
35
The graph: constituent data + composition
ML model
Ranking targets
Human review
Experimental validation
Summary: KG use for target prediction
What are our learnings?
Lesson 1: First and foremost, data modelling is absolutely critical for your
success in anything you might want to do with the graph
36
What are our learnings?
Lesson 1: First and foremost, data modelling is absolutely critical for your
success in anything you might want to do with the graph
• We have noticed that the graph topology, that is the "structure”, has direct and large
effects on predictive performance*
• Predictive performance depends heavily on the choice of ML approach, and no method
appear to be indisputably the best. Models have their shortcomings, and that ties in
together with the graph topology and the underlying data model.
37
* Stephen Bonner, Ufuk Kirik, Ola Engkvist, Jian Tang, Ian P Barrett, Implications of topological imbalance for representation learning on biomedical
knowledge graphs, Briefings in Bioinformatics, 2022;, bbac279, https://doi.org/10.1093/bib/bbac279
What are our learnings?
Lesson 1: First and foremost, data modelling is absolutely critical for your
success in anything you might want to do with the graph
• Building a good data model is not easy: there are difficult decisions to be made, and you
have to live with the consequences of these decisions.
Ø Disease vs Phenotype vs Symptom
Ø Gene vs Transcript vs Protein
38
What are our learnings?
Lesson 2: Cross-disciplinary science can only be done in
cross-disciplinary teams!
39 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
What are our learnings?
Lesson 2: Cross-disciplinary science can only be done in
cross-disciplinary teams!
• Get your SMEs involved in the data modelling, in feature
engineering, and in prediction (if you can).
We have cross-functional teams with many different expertise
typically working together on a daily basis.
40 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
What are our learnings?
Lesson 2: Cross-disciplinary science can only be done in
cross-disciplinary teams!
• Get your SMEs involved in the data modelling, in feature
engineering, and in prediction (if you can).
We have cross-functional teams with many different expertise
typically working together on a daily basis.
• Corollary: Effective communication is key! Learn to speak each
others language
41 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
What are our learnings?
Lesson 3: Don't rely on a single analytical/predictive method, experiment with different
models (that utilize different portions of the graph, or at least weight them differentially).
42
What are our learnings?
Lesson 3: Don't rely on a single analytical/predictive method, experiment with different
models (that utilize different portions of the graph, or at least weight them differentially).
43
• Continually evaluate and sanity-check your results. After all, a computational
model is a fancy calculator, no matter how advanced.
What are our learnings?
Lesson 3: Don't rely on a single analytical/predictive method, experiment with different
models (that utilize different portions of the graph, or at least weight them differentially).
44
• Continually evaluate and sanity-check your results. After all, a computational
model is a fancy calculator, no matter how advanced.
• Also worth considering what the cost of a bad prediction is for your business:
• for Facebook or LinkedIn it might not be the end of the world to suggest
someone you don't know as a contact
• that is not the world we live in, and may not be the world you live in
either.
So what?
I. Intuitive representation of massive amounts of connected data: Better exploration!
II. Use years of wisdom coming from graph-theory as well as the recent advances in
graph-based learning for better predictions (remember: the cost of failure/mistake)
III. There are great tools provided by the community; the collective expertise is incredibly
valuable, use it well!
45
Thank you!
46
Confidentiality Notice
This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove
it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the
contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus,
Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com
47

More Related Content

PPTX
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
PPTX
ENEL Electricity Grids on Neo4j Graph DB
PPTX
Demystifying Graph Neural Networks
PDF
The three layers of a knowledge graph and what it means for authoring, storag...
PDF
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
PPTX
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
PPTX
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
PDF
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
ENEL Electricity Grids on Neo4j Graph DB
Demystifying Graph Neural Networks
The three layers of a knowledge graph and what it means for authoring, storag...
GSK: How Knowledge Graphs Improve Clinical Reporting Workflows
Kerry Group: How Neo4j graph technology is delivering benefits to Kerry Group...
Banking Circle: Money Laundering Beware: A Modern Approach to AML with Machin...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...

What's hot (20)

PDF
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
PDF
Workshop Tel Aviv - Graph Data Science
PPTX
Easily Identify Sources of Supply Chain Gridlock
PPTX
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
PPTX
Big Data and Security - Where are we now? (2015)
PPTX
Introduction to Graph Databases
PPTX
Nodes 2023 - Knowledge graph based chatbot.pptx
PPTX
Scoutbee - Knowledge graphs at Scoutbee with Neo4j
PDF
Neanex - Semantic Construction with Graphs
PPTX
Elsevier: Empowering Knowledge Discovery in Research with Graphs
PDF
Workshop - Neo4j Graph Data Science
PDF
Neo4j: The path to success with Graph Database and Graph Data Science
PDF
Google BigQuery Best Practices
PDF
Building Reliable Data Lakes at Scale with Delta Lake
PDF
Introducing Neo4j
PDF
The Knowledge Graph Explosion
PPTX
How Graph Data Science can turbocharge your Knowledge Graph
PDF
JSONBはPostgreSQL9.5でいかに改善されたのか
PDF
Government GraphSummit: Leveraging Graphs for AI and ML
PDF
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
Workshop Tel Aviv - Graph Data Science
Easily Identify Sources of Supply Chain Gridlock
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Big Data and Security - Where are we now? (2015)
Introduction to Graph Databases
Nodes 2023 - Knowledge graph based chatbot.pptx
Scoutbee - Knowledge graphs at Scoutbee with Neo4j
Neanex - Semantic Construction with Graphs
Elsevier: Empowering Knowledge Discovery in Research with Graphs
Workshop - Neo4j Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
Google BigQuery Best Practices
Building Reliable Data Lakes at Scale with Delta Lake
Introducing Neo4j
The Knowledge Graph Explosion
How Graph Data Science can turbocharge your Knowledge Graph
JSONBはPostgreSQL9.5でいかに改善されたのか
Government GraphSummit: Leveraging Graphs for AI and ML
Adobe Behance Scales to Millions of Users at Lower TCO with Neo4j
Ad

Similar to AstraZeneca - The promise of graphs & graph-based learning in drug discovery (20)

PDF
Evotec - How can Knowledge Graphs support Druh Discovery
PDF
Adobe Master Collection CC Crack Advance Version 2025
PDF
Exlevel GrowFX for Autodesk 3ds Max Download
PDF
Practice Questions- How to Prepare for Hitachi Vantara HQT-6230
PDF
Aiseesoft Video Converter Ultimate 10.9.6
PDF
Adobe Illustrator 2025 v29.3.1 for MacOS Free Download
PDF
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
PDF
Autodesk Netfabb Ultimate 2025 free crack
PPTX
R Packages Unpacked
PDF
ReaConverter Pro Download (Latest 2025)
PDF
Download iTop VPN Crack Latest Version 2025?
PDF
K7 Ultimate Security Crack FREE latest version 2025
PDF
Adobe InDesign Crack FREE Download 2025 link
PDF
Remote Desktop Manager Enterprise 2024.3.29
PPTX
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
PDF
FAIR & AI Ready KGs for Explainable Predictions.pdf
PPTX
FAIR & AI Ready KGs for Explainable Predictions
PPTX
KG_based pharma marketing.pptx
PDF
Amia tb-review-12
PDF
Research Paper
Evotec - How can Knowledge Graphs support Druh Discovery
Adobe Master Collection CC Crack Advance Version 2025
Exlevel GrowFX for Autodesk 3ds Max Download
Practice Questions- How to Prepare for Hitachi Vantara HQT-6230
Aiseesoft Video Converter Ultimate 10.9.6
Adobe Illustrator 2025 v29.3.1 for MacOS Free Download
Astra Zeneca: How KG and GenAI Revolutionise Biopharma and Life Sciences
Autodesk Netfabb Ultimate 2025 free crack
R Packages Unpacked
ReaConverter Pro Download (Latest 2025)
Download iTop VPN Crack Latest Version 2025?
K7 Ultimate Security Crack FREE latest version 2025
Adobe InDesign Crack FREE Download 2025 link
Remote Desktop Manager Enterprise 2024.3.29
ASTRAZENECA. Knowledge Graphs Powering a Fast-moving Global Life Sciences Org...
FAIR & AI Ready KGs for Explainable Predictions.pdf
FAIR & AI Ready KGs for Explainable Predictions
KG_based pharma marketing.pptx
Amia tb-review-12
Research Paper
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
PDF
Website Design Services for Small Businesses.pdf
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PPTX
history of c programming in notes for students .pptx
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Salesforce Agentforce AI Implementation.pdf
Autodesk AutoCAD Crack Free Download 2025
Design an Analysis of Algorithms I-SECS-1021-03
AutoCAD Professional Crack 2025 With License Key
Advanced SystemCare Ultimate Crack + Portable (2025)
Complete Guide to Website Development in Malaysia for SMEs
CCleaner Pro 6.38.11537 Crack Final Latest Version 2025
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Oracle Fusion HCM Cloud Demo for Beginners
Designing Intelligence for the Shop Floor.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
wealthsignaloriginal-com-DS-text-... (1).pdf
Monitoring Stack: Grafana, Loki & Promtail
iTop VPN 6.5.0 Crack + License Key 2025 (Premium Version)
Website Design Services for Small Businesses.pdf
17 Powerful Integrations Your Next-Gen MLM Software Needs
history of c programming in notes for students .pptx
iTop VPN Crack Latest Version Full Key 2025
Embracing Complexity in Serverless! GOTO Serverless Bengaluru

AstraZeneca - The promise of graphs & graph-based learning in drug discovery

  • 1. The promise of graphs & graph-based learning in drug discovery Ufuk Kirik Data Science & Quantitative Biology, Discovery Sciences, R&D AstraZeneca Gothenburg, Sweden 2022-09-05
  • 2. Outline • Background • Drug Discovery Landscape • Graphs in Life Science (Research) • The promise • How we use them • What have we learned so far • The “so what?” 2
  • 3. Background 3 Who am I? Curious, nerdy kid Engineer/Applied mathematician Bioinformatician Data scientist Graph enthusiast
  • 4. Background 4 Who am I? What do I do? • Lead a team working with KG Insights • Develop processes to enrich our graphs with a wide variety of in-house generated data • Develop tools to effectively explore and exploit our graphs, catering to a broad range of users. Curious, nerdy kid Engineer/Applied mathematician Bioinformatician Data scientist Graph enthusiast
  • 5. Background What do I want to achieve with this talk? • Our journey into the graph world • Our challenges and approach to graph-based learning • Our learnings, generalized for wider audience 5 Who am I?
  • 6. Background What do I want to achieve with this talk? • Our journey into the graph world • Our challenges and approach to graph-based learning • Our learnings, generalized for wider audience • Will not include any graphics or specifics from our graphs, due to the sensitive nature of our project. 6 Who am I?
  • 7. Pre-clinical Clinical Follow-up Drug discovery landscape • What does it take to bring a new medicine to a patient? 7
  • 8. Pre-clinical Clinical Follow-up Drug discovery landscape • What does it take to bring a new medicine to a patient? 8 Ph. I Ph. II Ph. III (x10)
  • 9. Drug discovery landscape • What does it take to bring a new medicine to a patient? 9 Cook, D., Brown, D., Alexander, R. et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat Rev Drug Discov 13, 419–431 (2014). https://doi.org/10.1038/nrd4309 Bottleneck: Phase II Reason(s): Efficacy & Safety Problem: Time & Money
  • 11. What is so cool about graphs anyways? • Connected information everywhere • Graphs are a natural way to represent connected information • Graph theory is old and rich, most importantly it is incredibly useful 12
  • 12. What is so cool about graphs anyways? • Connected information everywhere • Graphs are a natural way to represent connected information • Graph theory is old and rich, most importantly it is incredibly useful 13
  • 13. What is so cool about graphs anyways? • Connected information everywhere • Graphs are a natural way to represent connected information • Graph theory is old and rich, most importantly it is incredibly useful 14
  • 14. What is so cool about graphs anyways? • Connected information everywhere • Graphs are a natural way to represent connected information • Graph theory is old and rich, most importantly it is incredibly useful • ML on graphs is extremely useful: • Movie/Book/Product recommendations (e.g. Amazon, Netflix, Spotify) • Personal/Professional network (e.g. social media) • Like all things ML, sometimes it can go terribly wrong too! 15
  • 15. Graphs in life sciences • Nothing works in isolation in the biological domain! 16
  • 16. Graphs in life sciences • Nothing works in isolation in the biological domain! • Crash course in cell biology: • Central dogma DNA -> RNA -> Proteins -> Function 17
  • 17. Graphs in life sciences • Nothing works in isolation in the biological domain! • Crash course in cell biology: • Central dogma DNA -> RNA -> Proteins -> Function • Pathways & Functional annotations • Cells & Tissues ... • Compounds & Diseases ... 18
  • 18. Graphs in life sciences • Nothing works in isolation in the biological domain! 19
  • 19. Analogy - if it was all about people: • genes would be the general identifiable information: name, address, maybe social security number etc [Ufuk Kirik, The data scientist, Lives in Gothenburg] 20
  • 20. Analogy - if it was all about people: • genes would be the general identifiable information: name, address, maybe social security number etc [Ufuk Kirik, The data scientist, Lives in Gothenburg] • proteins would be the actual physical presence; what we do and how we look like while doing that. [The person standing in front of you, here I am talking, dressed smart casual] 21
  • 21. Analogy - if it was all about people: • genes would be the general identifiable information: name, address, maybe social security number etc [Ufuk Kirik, The data scientist, Lives in Gothenburg] • proteins would be the actual physical presence of me; what we do and how we look like while doing that. [The person standing in front of you, here I am talking, dressed smart casual] • What a gene/protein is and what it does varies on the context • Not only the activity changes but also with whom I do these activities change based on time and location. 22
  • 22. Analogy - if it was all about people: • If I do things I am not supposed to, or allowed to, then we end up having problems, which is essentially what happens in the case of most diseases. • Now that is where the analogy falls short, since although there is only 1 of me, proteins can, and typically do, have many copies co-existing at the same. • Nevertheless all of these interactions can be captured with a graph structure. I would actually go as far to speculate that a graph is the only intuitive way to represent information like this. 23
  • 23. Biomedical knowledge graphs 24 Himmelstein, D. S., & Baranzini, S. E. (2015). Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease- Associated Genes. PLoS computational biology, 11(7), e1004259. https://doi.org/10.1371/journal.pcbi.1004259
  • 24. How do we deploy graphs? • We refer to graphs in the context of knowledge graphs, where we aggregate data from various different sources; public and internal, conceptual and experimental alike. 25
  • 25. Graph composition 26 Graph Composition • These data sources can then be transformed into a knowledge graph through a composition stage. • Unfortunately, many biases can be present during this phase.
  • 26. How do we deploy graphs? • We refer to graphs in the context of knowledge graphs, where we aggregate data from various different sources; public and internal, conceptual and experimental alike. • We use these graphs for both predictive and explorative purposes 27
  • 27. How do we deploy graphs? • We refer to graphs in the context of knowledge graphs, where we aggregate data from various different sources; public and internal, conceptual and experimental alike. • We use these graphs for both predictive and explorative purposes • Predictive examples: • which gene could be a good drug target for a given disease? • which compound could be a good/useful treatment for a given disease? 28
  • 28. How do we deploy graphs? • We refer to graphs in the context of knowledge graphs, where we aggregate data from various different sources; public and internal, conceptual and experimental alike. • We use these graphs for both predictive and explorative purposes • Predictive examples: • which gene could be a good drug target for a given disease? • which compound could be a good/useful treatment for a given disease? • Explorative examples: • what do we know about gene X (in the context of disease D)? • what level of evidence do we have that supports/contradicts a putative use of compound C for a new indication (that is a new disease)? 29
  • 29. Example • (Find all/some genes that associates with Parkinson's | Parkinson's -> associates -> ?). 30
  • 30. Example • Find all/some genes that associates with Parkinson's | Parkinson's -> associates -> ?. • What is the likelihood of an associates edge between HP and Parkinson’s Disease? 31
  • 31. Biomedical knowledge graphs 32 Entity Embedding matrix Relation Embedding matrix KGEM Embeddings are learned for each entity (gene, disease, compound, etc) and relation in the graph. The embedding dimension typically falls with the range of 32-1024. Himmelstein, D. S., & Baranzini, S. E. (2015). Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease- Associated Genes. PLoS computational biology, 11(7), e1004259. https://doi.org/10.1371/journal.pcbi.1004259
  • 32. 33 Summary: KG use for target prediction The graph: constituent data + composition ML model Ranking targets Human review Experimental validation
  • 33. 34 Summary: KG use for target prediction The graph: constituent data + composition ML model Ranking targets Human review Experimental validation
  • 34. 35 The graph: constituent data + composition ML model Ranking targets Human review Experimental validation Summary: KG use for target prediction
  • 35. What are our learnings? Lesson 1: First and foremost, data modelling is absolutely critical for your success in anything you might want to do with the graph 36
  • 36. What are our learnings? Lesson 1: First and foremost, data modelling is absolutely critical for your success in anything you might want to do with the graph • We have noticed that the graph topology, that is the "structure”, has direct and large effects on predictive performance* • Predictive performance depends heavily on the choice of ML approach, and no method appear to be indisputably the best. Models have their shortcomings, and that ties in together with the graph topology and the underlying data model. 37 * Stephen Bonner, Ufuk Kirik, Ola Engkvist, Jian Tang, Ian P Barrett, Implications of topological imbalance for representation learning on biomedical knowledge graphs, Briefings in Bioinformatics, 2022;, bbac279, https://doi.org/10.1093/bib/bbac279
  • 37. What are our learnings? Lesson 1: First and foremost, data modelling is absolutely critical for your success in anything you might want to do with the graph • Building a good data model is not easy: there are difficult decisions to be made, and you have to live with the consequences of these decisions. Ø Disease vs Phenotype vs Symptom Ø Gene vs Transcript vs Protein 38
  • 38. What are our learnings? Lesson 2: Cross-disciplinary science can only be done in cross-disciplinary teams! 39 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 39. What are our learnings? Lesson 2: Cross-disciplinary science can only be done in cross-disciplinary teams! • Get your SMEs involved in the data modelling, in feature engineering, and in prediction (if you can). We have cross-functional teams with many different expertise typically working together on a daily basis. 40 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 40. What are our learnings? Lesson 2: Cross-disciplinary science can only be done in cross-disciplinary teams! • Get your SMEs involved in the data modelling, in feature engineering, and in prediction (if you can). We have cross-functional teams with many different expertise typically working together on a daily basis. • Corollary: Effective communication is key! Learn to speak each others language 41 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 41. What are our learnings? Lesson 3: Don't rely on a single analytical/predictive method, experiment with different models (that utilize different portions of the graph, or at least weight them differentially). 42
  • 42. What are our learnings? Lesson 3: Don't rely on a single analytical/predictive method, experiment with different models (that utilize different portions of the graph, or at least weight them differentially). 43 • Continually evaluate and sanity-check your results. After all, a computational model is a fancy calculator, no matter how advanced.
  • 43. What are our learnings? Lesson 3: Don't rely on a single analytical/predictive method, experiment with different models (that utilize different portions of the graph, or at least weight them differentially). 44 • Continually evaluate and sanity-check your results. After all, a computational model is a fancy calculator, no matter how advanced. • Also worth considering what the cost of a bad prediction is for your business: • for Facebook or LinkedIn it might not be the end of the world to suggest someone you don't know as a contact • that is not the world we live in, and may not be the world you live in either.
  • 44. So what? I. Intuitive representation of massive amounts of connected data: Better exploration! II. Use years of wisdom coming from graph-theory as well as the recent advances in graph-based learning for better predictions (remember: the cost of failure/mistake) III. There are great tools provided by the community; the collective expertise is incredibly valuable, use it well! 45
  • 46. Confidentiality Notice This file is private and may contain confidential and proprietary information. If you have received this file in error, please notify us and remove it from your system and note that you must not copy, distribute or take any action in reliance on it. Any unauthorized use or disclosure of the contents of this file is not permitted and may be unlawful. AstraZeneca PLC, 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA, UK, T: +44(0)203 749 5000, www.astrazeneca.com 47