SlideShare a Scribd company logo
Polina Shpudeiko
Scientific Programmer, Computational Biology
How can Knowledge Graphs
Support Drug Discovery?
Graph Summit Neo4j, Frankfurt, October 10th, 2023
PAGE 2
Agenda
1. Why do we need knowledge graphs in
drug discovery?
2. How can we build them to solve our
challenges?
3. What can be done with the power of
graphs?
4. Where will it lead us?
PAGE 3
Why do we need knowledge
graphs in drug discovery?
PAGE 4
Integration of public and internal knowledge
Towards a comprehensive understanding of diseases and therapies
Public knowledge Internal knowledge
PAGE 5
The life science space is diverse
Navigating the complexity of biology, chemistry and clinics
Genes
Proteins
Mutations
Tissues
Pathways
Cell types
Compounds
Diseases
PAGE 6
The life science space is diverse
Various databases capture the structured public knowledge
And these are only a few examples...
Genes
Proteins Compounds
Mutations
Tissues
Diseases Pathways
Cell types
PAGE 7
Literature space is adding more complexity
Scientific articles are the key for sharing novel knowledge
Statistics
There are approximately
30,000 journals in the
world with an increasing
rate of 5-7% per year
The rapidly evolving landscape of
scientific research, marked by an
annual influx of approximately
2 million new articles
There are already 36 million
articles in the open source
database for articles
These ideas can be extracted by utilising natural
language processing (NLP)
PAGE 8
The drug discovery process as our in-house data source
Each step generates novel insights and requires dedicated expertise
Clinics
Disease biology Screening and compound chemistry
Target ID and
validation
Hit ID and
optimization
Lead
optimisation
Candidate
selection
Candidate
profiling
Clinical trial
Areas of interest
PAGE 9
Mission – harmonize data,
understand diseases and support
the development of new therapies
Integration of public and internal knowledge
Combinig it together will lead us towards novel targets discovery at Evotec
Public knowledge Internal knowledge
PAGE 10
How can we build them
to solve our challenges?
PAGE 11
Challenge of managing diverse data
PAGE 12
Experimental data
Public ontologies space is not standardized Public ontologies space is incomplete
Ontologies do not cover cutting-edge science and novel associations
Multiple distinct
ontologies for Diseases
The ontological space is complex and incomplete
Stable and reliable data models and custom ontologies are essential
PAGE 13
Knowledge graph as harmonisation tool
Bringing together heterogeneous biological data in one place
• 15 databases
• 30 mln nodes and
100 mln connections
and counting
• Deep understanding of
ontologies (hierarchical/
semantical connections
between different entities)
was re-quired for har-
monisation of diseases
and traits
PAGE 14
• Extract public know-
ledge from scientific
articles with NLP
• Overlay de-novo mined
knowledge with
ontological database
space
Knowledge graph as integration tool
Integration of literature data with NLP approaches
Pathways
Tissue
Genes
Compounds
Diseases
Traits
Mutations
NLP
Article
PAGE 15
PMID prevent
Depressive
disorder
BAIAP2
Knowledge graph
• Natural Language Processing (NLP) extracts keys mentioned in the articles
• NLP-powered search engines can understand the context and semantics of queries
• Ontologies help to harmonize the extracted knowledge in one graph
Knowledge graph as tool for integration
Integration of literature data with NLP approaches
PAGE 16
• Signatures are a
representation of
internal experimental
knowledge
• Example: genes which
are changing their
expression in response
to therapy
Pathways
Tissue
Genes
Compounds
Diseases
Traits
Mutations
NLP
Article
Knowledge graph as tool for unification
Combining internal knowledge with public data
Signatures
PAGE 17
What can be done with
the power of graphs?
PAGE 18
Disease tree ontology:
Fibrosis
Integration of Public and Internal data in graph
Using public knowledge for target identification from patient-derived signatures
Graph representation
of a Signature
PAGE 19
• Expression data in a large patient
cohort can be enriched with hetero-
geneous data from public (NLP,
pathways, cell types) and internal
(in-house signatures) resources
• This allows us to understand better
underlying mechanisms that drive
the disease
Patient
stratification
based signature
Integration of patients signatures and experimental models
Translational research from animal to human
In vivo
signatures
In vitro
signatures
PAGE 20
Kidney Diseases Genes Any Connected Disease
• Disease space is defined by
Parent term of ontology
(Kidney Disease)
• All NLP co-mentions of
child diseases to genes
are collected
• To determine specificity
all other diseases that were
co-mentioned are added
• Co-mention edges are
weighted by the number
of unique articles
Defining molecular disease spaces
Based on internal experimental data and NLP-mined external knowledge
PAGE 21
Genes associated
with genetic
kidney diseases
Genes are
involved in the
infectious
diseases
Neoplasms
Infectious Diseases
Kidney Diseases
Defining molecular disease spaces
Identification of kidney-specific genes in the embeddings of kidney disease space
Polycystic Kidney
Diseases
Genes which are
taking part in
cancer and not
specific to disease
space of interest
Genes which
drive kidney
diseases are the
most important
target candidates
PAGE 22
Sharing of the data insights
Neodash solution for internal knowledge sharing
PAGE 23
Where will it lead us?
PAGE 24
PAGE 24
Summary and outlook
• Graphs are powerful tools for data
harmonization in diverse life science
space – bringing ontologies together
• Alliance between public and internal
knowledge into one place with graphs –
allowed to characterize internal signatures
in the most efficient way
• Application of diverse graph algorithms
helps us understand hidden insights in our
data – identification of specific genes for the
disease of interest with the highest potential
for Target ID
Polina Shpudeiko
Scientific Programmer, Computational Biology
polina.shpudeiko@evotec.com

More Related Content

PPTX
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
PDF
Graphs in Telecommunications - Jesus Barrasa, Neo4j
PDF
The Path To Success With Graph Database and Analytics
PPTX
ENEL Electricity Grids on Neo4j Graph DB
PPTX
The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...
PPTX
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
PPTX
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
PDF
Neo4j Graph Data Science - Webinar
AstraZeneca at Neo4j GraphSummit London 14Nov23.pptx
Graphs in Telecommunications - Jesus Barrasa, Neo4j
The Path To Success With Graph Database and Analytics
ENEL Electricity Grids on Neo4j Graph DB
The Art of the Possible with Graph - Sudhir Hasbe - GraphSummit London 14 Nov...
Neo4j GraphSummit London - The Path To Success With Graph Database and Data S...
Neo4j Graph Use Cases, Bruno Ungermann, Neo4j
Neo4j Graph Data Science - Webinar

What's hot (15)

PDF
SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recher...
PDF
Learning Convolutional Neural Networks for Graphs
PDF
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
PDF
Making connections matter: 2 use cases on graphs & analytics solutions
PDF
Graph-Based Customer Journey Analytics with Neo4j
PDF
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
PDF
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
PPTX
Smarter Fraud Detection With Graph Data Science
PDF
The path to success with Graph Database and Graph Data Science
PDF
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
PDF
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
PDF
Neo4j: The path to success with Graph Database and Graph Data Science
PDF
3. Relationships Matter: Using Connected Data for Better Machine Learning
PPTX
Elsevier: Empowering Knowledge Discovery in Research with Graphs
PDF
Workshop - Neo4j Graph Data Science
SERVIER Pegasus - Graphe de connaissances pour les phases primaires de recher...
Learning Convolutional Neural Networks for Graphs
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Making connections matter: 2 use cases on graphs & analytics solutions
Graph-Based Customer Journey Analytics with Neo4j
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
Knowledge Graphs and Graph Data Science: More Context, Better Predictions (Ne...
Smarter Fraud Detection With Graph Data Science
The path to success with Graph Database and Graph Data Science
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Optimizing the Supply Chain with Knowledge Graphs, IoT and Digital Twins_Moor...
Neo4j: The path to success with Graph Database and Graph Data Science
3. Relationships Matter: Using Connected Data for Better Machine Learning
Elsevier: Empowering Knowledge Discovery in Research with Graphs
Workshop - Neo4j Graph Data Science
Ad

Similar to Evotec - How can Knowledge Graphs support Druh Discovery (20)

PDF
Thesis On Psoriasis
PDF
Advancing Translational Research With The Semantic Web
PPTX
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
PPT
Introducción a la bioinformatica
PDF
Bioinformatics in the Clinical Pipeline: Contribution in Genomic Medicine
PDF
Amia tb-review-12
PDF
Julian Little & Beth Potter: Rare Disease Day 2016 Conference
PDF
ACROBioscience's Aneuro: An Overview of Neuroscience Research
PDF
Friend harvard 2013-01-30
PDF
Leverage machine learning and new technologies to enhance rwe generation and ...
PDF
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
PDF
Ontology Based Information Extraction for Disease Intelligence
PDF
The reality of moving towards precision medicine
PDF
Next generation sequencing
PPT
Bio ontology drtc-seminar_anwesha
PPTX
Haendel clingenetics.3.14.14
PDF
Repurposing large datasets for exposomic discovery in disease
PDF
Precision Medicine: Integrating Genomics and Engineering (www.kiu.ac.ug)
PDF
Methods to enhance the validity of precision guidelines emerging from big data
PDF
Thesis On Psoriasis
Advancing Translational Research With The Semantic Web
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...
Introducción a la bioinformatica
Bioinformatics in the Clinical Pipeline: Contribution in Genomic Medicine
Amia tb-review-12
Julian Little & Beth Potter: Rare Disease Day 2016 Conference
ACROBioscience's Aneuro: An Overview of Neuroscience Research
Friend harvard 2013-01-30
Leverage machine learning and new technologies to enhance rwe generation and ...
K Bobyk - %22A Primer on Personalized Medicine - The Imminent Systemic Shift%...
Ontology Based Information Extraction for Disease Intelligence
The reality of moving towards precision medicine
Next generation sequencing
Bio ontology drtc-seminar_anwesha
Haendel clingenetics.3.14.14
Repurposing large datasets for exposomic discovery in disease
Precision Medicine: Integrating Genomics and Engineering (www.kiu.ac.ug)
Methods to enhance the validity of precision guidelines emerging from big data
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
Nekopoi APK 2025 free lastest update
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
iTop VPN Crack Latest Version Full Key 2025
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Website Design Services for Small Businesses.pdf
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
PDF
Cost to Outsource Software Development in 2025
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Patient Appointment Booking in Odoo with online payment
Nekopoi APK 2025 free lastest update
Oracle Fusion HCM Cloud Demo for Beginners
Reimagine Home Health with the Power of Agentic AI​
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
iTop VPN Crack Latest Version Full Key 2025
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Operating system designcfffgfgggggggvggggggggg
Computer Software and OS of computer science of grade 11.pptx
Weekly report ppt - harsh dattuprasad patel.pptx
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Website Design Services for Small Businesses.pdf
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
Cost to Outsource Software Development in 2025
Wondershare Filmora 15 Crack With Activation Key [2025
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf

Evotec - How can Knowledge Graphs support Druh Discovery

  • 1. Polina Shpudeiko Scientific Programmer, Computational Biology How can Knowledge Graphs Support Drug Discovery? Graph Summit Neo4j, Frankfurt, October 10th, 2023
  • 2. PAGE 2 Agenda 1. Why do we need knowledge graphs in drug discovery? 2. How can we build them to solve our challenges? 3. What can be done with the power of graphs? 4. Where will it lead us?
  • 3. PAGE 3 Why do we need knowledge graphs in drug discovery?
  • 4. PAGE 4 Integration of public and internal knowledge Towards a comprehensive understanding of diseases and therapies Public knowledge Internal knowledge
  • 5. PAGE 5 The life science space is diverse Navigating the complexity of biology, chemistry and clinics Genes Proteins Mutations Tissues Pathways Cell types Compounds Diseases
  • 6. PAGE 6 The life science space is diverse Various databases capture the structured public knowledge And these are only a few examples... Genes Proteins Compounds Mutations Tissues Diseases Pathways Cell types
  • 7. PAGE 7 Literature space is adding more complexity Scientific articles are the key for sharing novel knowledge Statistics There are approximately 30,000 journals in the world with an increasing rate of 5-7% per year The rapidly evolving landscape of scientific research, marked by an annual influx of approximately 2 million new articles There are already 36 million articles in the open source database for articles These ideas can be extracted by utilising natural language processing (NLP)
  • 8. PAGE 8 The drug discovery process as our in-house data source Each step generates novel insights and requires dedicated expertise Clinics Disease biology Screening and compound chemistry Target ID and validation Hit ID and optimization Lead optimisation Candidate selection Candidate profiling Clinical trial Areas of interest
  • 9. PAGE 9 Mission – harmonize data, understand diseases and support the development of new therapies Integration of public and internal knowledge Combinig it together will lead us towards novel targets discovery at Evotec Public knowledge Internal knowledge
  • 10. PAGE 10 How can we build them to solve our challenges?
  • 11. PAGE 11 Challenge of managing diverse data
  • 12. PAGE 12 Experimental data Public ontologies space is not standardized Public ontologies space is incomplete Ontologies do not cover cutting-edge science and novel associations Multiple distinct ontologies for Diseases The ontological space is complex and incomplete Stable and reliable data models and custom ontologies are essential
  • 13. PAGE 13 Knowledge graph as harmonisation tool Bringing together heterogeneous biological data in one place • 15 databases • 30 mln nodes and 100 mln connections and counting • Deep understanding of ontologies (hierarchical/ semantical connections between different entities) was re-quired for har- monisation of diseases and traits
  • 14. PAGE 14 • Extract public know- ledge from scientific articles with NLP • Overlay de-novo mined knowledge with ontological database space Knowledge graph as integration tool Integration of literature data with NLP approaches Pathways Tissue Genes Compounds Diseases Traits Mutations NLP Article
  • 15. PAGE 15 PMID prevent Depressive disorder BAIAP2 Knowledge graph • Natural Language Processing (NLP) extracts keys mentioned in the articles • NLP-powered search engines can understand the context and semantics of queries • Ontologies help to harmonize the extracted knowledge in one graph Knowledge graph as tool for integration Integration of literature data with NLP approaches
  • 16. PAGE 16 • Signatures are a representation of internal experimental knowledge • Example: genes which are changing their expression in response to therapy Pathways Tissue Genes Compounds Diseases Traits Mutations NLP Article Knowledge graph as tool for unification Combining internal knowledge with public data Signatures
  • 17. PAGE 17 What can be done with the power of graphs?
  • 18. PAGE 18 Disease tree ontology: Fibrosis Integration of Public and Internal data in graph Using public knowledge for target identification from patient-derived signatures Graph representation of a Signature
  • 19. PAGE 19 • Expression data in a large patient cohort can be enriched with hetero- geneous data from public (NLP, pathways, cell types) and internal (in-house signatures) resources • This allows us to understand better underlying mechanisms that drive the disease Patient stratification based signature Integration of patients signatures and experimental models Translational research from animal to human In vivo signatures In vitro signatures
  • 20. PAGE 20 Kidney Diseases Genes Any Connected Disease • Disease space is defined by Parent term of ontology (Kidney Disease) • All NLP co-mentions of child diseases to genes are collected • To determine specificity all other diseases that were co-mentioned are added • Co-mention edges are weighted by the number of unique articles Defining molecular disease spaces Based on internal experimental data and NLP-mined external knowledge
  • 21. PAGE 21 Genes associated with genetic kidney diseases Genes are involved in the infectious diseases Neoplasms Infectious Diseases Kidney Diseases Defining molecular disease spaces Identification of kidney-specific genes in the embeddings of kidney disease space Polycystic Kidney Diseases Genes which are taking part in cancer and not specific to disease space of interest Genes which drive kidney diseases are the most important target candidates
  • 22. PAGE 22 Sharing of the data insights Neodash solution for internal knowledge sharing
  • 23. PAGE 23 Where will it lead us?
  • 24. PAGE 24 PAGE 24 Summary and outlook • Graphs are powerful tools for data harmonization in diverse life science space – bringing ontologies together • Alliance between public and internal knowledge into one place with graphs – allowed to characterize internal signatures in the most efficient way • Application of diverse graph algorithms helps us understand hidden insights in our data – identification of specific genes for the disease of interest with the highest potential for Target ID
  • 25. Polina Shpudeiko Scientific Programmer, Computational Biology polina.shpudeiko@evotec.com