SlideShare a Scribd company logo
From Advanced Queries to Algorithms to
Advanced ML: 3 Pharmaceutical Graph Use Cases
Dr. Alexander Jarasch
• 5 partners + assoc. partners


• 450 researchers


• bundles basic research and
clinical trials expertise


• => variety of data


=> unstructured


=> heterogeneous


=> not connected


=> unFAIR
DZD Data and Knowledge Management team
Dr. Alexander Jarasch
Justus Täger
Tim Bleimehl
Angela Dedie
Yaroslav Zdravomyslov
The Challenge


Connecting data (silos) -> get new insights
Easy question -> Difficult to answer
The Challenge


Variety of users / diversity of scientific questions
Scientists
Medical

Doctors
Data

Scientists
Graphdatabase
Biological question:


Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model?




The actual question (from a data-point-of-view):




Is there a connection between A and R?


=> 3s to look into the Excel sheet


Why graph? Easy scientific question


The actual question (from a data-point-of-view):




Is there a connection between A and R?


=> 3s to look into the graph


A
B
C
E
D
F
G
K
Q
R
S
W
Z
U
Why graph? Easy scientific question
Back to the question
Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model?
Genomics
Human diabetic data
Genes
SNPs
Proteins
Enzymes
Pathways
Metabolites
Metabolomics
Pre diabetic pig
Metabolites
List of SNPs
List of Genes of
(species 1)
List of Proteins of
(species 1)
List of loci
List of Enzymes of
(species 1)
List of Pathways of
(species 1)
List of Metabolites
of (species 1)
List of Metabolites
of (species 2)
graph
Why graph? -> why not relational
• biomedical data / healthcare data is highly connected


• => variety of data


=> unstructured


=> heterogeneous


=> not connected


=> unFAIR


• easy to model


• extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model


• scalable (Billion of nodes+relationships on a single machine


• easy to query (cyclic dependencies)


• GraphDataScience library + graph embeddings
Alzheimer‘s
cancer
cardio
vascular
diseases
diabetes
Lung


diseases
infectious


diseases
new hypotheses
Diseases are connected
DZDconnect: Concept
DZD in-house data
Natural Language Processing


Inferring knowledge
Knowledge Graph
DZDconnect: stats
• PROD-Server: 323m nodes, 1.1bn relationships => 480GB


• DEV-Server: 1.1bn nodes, 4.8bn relationships


• Singleserver (60 CPUs, 256GB memory, only SSDs)


• 4 developers


• Neo4j enterprise (live backup, GDS)


• UI: flask web server, SemSpect, Neo4j browser


• Visualization for interactive browsing (SemSpect by derive GmbH)


• Bloom (semi-natural-language queries)
Strata Data


Award finalist 2019
bytes4diabetes Award
2020
Graphie Award 2018
We have


DB role model
DZDconnect:


data integration + ML
Gene RNA Protein
CODES CODES
CODES*
• Python


• Py2Neo, GraphIO


• Docker Pipeline for orchestration (open-source by DZD)


• Based on integrated data => annotate / enrich


• textmatching + Natural Language Processing


• „shortcuts“ for queries (reduce #hops)


• inferring knowledge
DZDconnect:


data model <-> human readable = easy to query
DZDconnect:


data model
The Challenge


User with a specific input => specific output
Scientist
multi-omics

experiment

output
Flask app
The Challenge


User ”start somewhere -> explore freely knowledge”
SemSpect
interactive
browsing
Start from any node
Scientist

or

Medical

Doctor
The Challenge


User with data analysis skills / computer scientist
Scientist
Start from any node
Cypher query language
Graph Data
Science
Use case 1


Handle mapping identifiers of molecular entities
Knowledge Graph
Query „friends of a friend“ on a gene level


Example: diabetes relevant gene ‚TCF7L2’
match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
Use case 2


Find information that is NOW connected
Knowledge Graph
Query for SNPs (mutations) associated to diabetes


Output: relevant protein and its function (ontology terms)
match (tr:Trait)


where tr.name contains ‚diabetes mellitus‘


with tr as disease


match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)-
[:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]->
(prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology)


return path
Use case 3


Using graph algorithms to infer new insights
Natural Language
Processing


Ontologies


Knowledge Graph
Google’s page rank algorithm - find the most relevant gene


finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell
• 140’000 abstracts from


Covid19 related publications


• NamedEntityRecognition


of gene names


• Page Rank identified


‚ACE2‘ as the most relevant


gene
Who’s this ACE2-guy?
source: https://www.benaroyaresearch.org/blog/post/11-things-know-about-mrna-vaccines-covid-19
Use case 4


Using node embeddings to sub phenotype diabetic patients
Natural
DZDconnect


connect raw data of diabetic patients with cancer
Clinical data from 404 diabetic patients
DZDconnect


connect lipidomics fingerprint
Lipidomics
Lipidomics experiment with 116 specific lipids
DZDconnect


connect transcriptomics fingerprint
Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
Transform patients


Fast random projections (fastRP)
CALL gds.fastRP.write
(

'patients'
,

{

embeddingDimension: 50
,

writeProperty: 'fastrp-
embedding'
}

)

YIELD nodePropertiesWritten
Lipido
k-nearest neighbour clustering with k=5


representing the 5 diabetes subtypes
patient 01 patient 02
patient 03
Graph

algorithms
patient 04
patient 05
patient 02
p
a
t
i
e
n
t
0
4
patient 03
patient 05
patient 01
subphenotyping of diabetic patients
DZDconnect


connect patient data with knowledge graph
Transcript
Gene
Synonyms
Abstract
PubMed


Article
Keyword


MeSH-term


Ontology term
Hello role-model :-)
Take home message
• Knowledge graph


• as single point of truth


• connect in-house data


• scalability


• infer new insights


• Use cases:


• simple and advanced (Cypher) queries


• Graph Data Science library (page rank, kNN)


• Node embeddings for complex data


• NLP
• Visualization of graph


• different users


• flask app, browser, SemSpect,…
Thanks to

More Related Content

PPT
حل المعادله التربيعيه باكمال المربع
PDF
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
PPTX
Biomedical_Knowledge_Graph_Presentation.pptx
PPTX
A Distributed Annotation Pipeline for MSSNG
PDF
Neo4j_Cypher.pdf
PDF
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
PDF
Neo4j for Healthcare & Life Sciences
PPT
Quantitative Medicine Feb 2009
حل المعادله التربيعيه باكمال المربع
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
Biomedical_Knowledge_Graph_Presentation.pptx
A Distributed Annotation Pipeline for MSSNG
Neo4j_Cypher.pdf
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Neo4j for Healthcare & Life Sciences
Quantitative Medicine Feb 2009

Similar to From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases (20)

PPTX
FAIR & AI Ready KGs for Explainable Predictions
PPTX
2013 alumni-webinar
PPTX
Charleston Conference 2016
PDF
Big Data in Pharma - Overview and Use Cases
PPTX
Fostering Serendipity through Big Linked Data
PPTX
Practical semantics in the pharmaceutical industry - the Open PHACTS project
PPTX
EiTESAL eHealth Conference 14&15 May 2017
PPTX
Final-Presentation
PDF
Cancer Analytics Poster
PPT
Rescuing Data from Decaying and Moribund Clinical Information Systems
PPT
Semantic Web for Health Care and Biomedical Informatics
PDF
Stephen Friend Dana Farber Cancer Institute 2011-10-24
PDF
The Power of Graphs to Analyze Biological Data
PDF
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
PPTX
Transparency in the Data Supply Chain
PPTX
Docker in Open Science Data Analysis Challenges by Bruce Hoff
PDF
grizzly - informal overview - pydata boston 2013
PDF
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
PDF
Basics of Data Analysis in Bioinformatics
PDF
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
FAIR & AI Ready KGs for Explainable Predictions
2013 alumni-webinar
Charleston Conference 2016
Big Data in Pharma - Overview and Use Cases
Fostering Serendipity through Big Linked Data
Practical semantics in the pharmaceutical industry - the Open PHACTS project
EiTESAL eHealth Conference 14&15 May 2017
Final-Presentation
Cancer Analytics Poster
Rescuing Data from Decaying and Moribund Clinical Information Systems
Semantic Web for Health Care and Biomedical Informatics
Stephen Friend Dana Farber Cancer Institute 2011-10-24
The Power of Graphs to Analyze Biological Data
The Power of Graphs to Analyze Biological Data - Davy Suvee @ GraphConnect Lo...
Transparency in the Data Supply Chain
Docker in Open Science Data Analysis Challenges by Bruce Hoff
grizzly - informal overview - pydata boston 2013
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
Basics of Data Analysis in Bioinformatics
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Ad

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
Ad

Recently uploaded (20)

PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
AI in Product Development-omnex systems
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Digital Strategies for Manufacturing Companies
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPT
Introduction Database Management System for Course Database
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administraation Chapter 3
CHAPTER 2 - PM Management and IT Context
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
VVF-Customer-Presentation2025-Ver1.9.pptx
L1 - Introduction to python Backend.pptx
AI in Product Development-omnex systems
Online Work Permit System for Fast Permit Processing
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Design an Analysis of Algorithms II-SECS-1021-03
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Digital Strategies for Manufacturing Companies
Design an Analysis of Algorithms I-SECS-1021-03
Nekopoi APK 2025 free lastest update
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Introduction Database Management System for Course Database

From Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases

  • 1. From Advanced Queries to Algorithms to Advanced ML: 3 Pharmaceutical Graph Use Cases Dr. Alexander Jarasch
  • 2. • 5 partners + assoc. partners • 450 researchers • bundles basic research and clinical trials expertise • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR
  • 3. DZD Data and Knowledge Management team Dr. Alexander Jarasch Justus Täger Tim Bleimehl Angela Dedie Yaroslav Zdravomyslov
  • 4. The Challenge Connecting data (silos) -> get new insights Easy question -> Difficult to answer
  • 5. The Challenge Variety of users / diversity of scientific questions Scientists Medical
 Doctors Data
 Scientists Graphdatabase
  • 6. Biological question: Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R? => 3s to look into the Excel sheet Why graph? Easy scientific question
  • 7. 
 The actual question (from a data-point-of-view): 
 
 Is there a connection between A and R? => 3s to look into the graph A B C E D F G K Q R S W Z U Why graph? Easy scientific question
  • 8. Back to the question Are human T2D genes enzymes acting on metabolites which in turn are regulated in pig diabetes model? Genomics Human diabetic data Genes SNPs Proteins Enzymes Pathways Metabolites Metabolomics Pre diabetic pig Metabolites List of SNPs List of Genes of (species 1) List of Proteins of (species 1) List of loci List of Enzymes of (species 1) List of Pathways of (species 1) List of Metabolites of (species 1) List of Metabolites of (species 2) graph
  • 9. Why graph? -> why not relational • biomedical data / healthcare data is highly connected • => variety of data 
 => unstructured 
 => heterogeneous 
 => not connected 
 => unFAIR • easy to model • extremely flexible / easy adoptable („re-shaping the graph“) vs. static SQL model • scalable (Billion of nodes+relationships on a single machine • easy to query (cyclic dependencies) • GraphDataScience library + graph embeddings
  • 11. DZDconnect: Concept DZD in-house data Natural Language Processing Inferring knowledge Knowledge Graph
  • 12. DZDconnect: stats • PROD-Server: 323m nodes, 1.1bn relationships => 480GB • DEV-Server: 1.1bn nodes, 4.8bn relationships • Singleserver (60 CPUs, 256GB memory, only SSDs) • 4 developers 
 • Neo4j enterprise (live backup, GDS) • UI: flask web server, SemSpect, Neo4j browser • Visualization for interactive browsing (SemSpect by derive GmbH) • Bloom (semi-natural-language queries) Strata Data 
 Award finalist 2019 bytes4diabetes Award 2020 Graphie Award 2018 We have 
 DB role model
  • 13. DZDconnect: data integration + ML Gene RNA Protein CODES CODES CODES* • Python • Py2Neo, GraphIO • Docker Pipeline for orchestration (open-source by DZD) • Based on integrated data => annotate / enrich • textmatching + Natural Language Processing • „shortcuts“ for queries (reduce #hops) • inferring knowledge
  • 14. DZDconnect: data model <-> human readable = easy to query
  • 16. The Challenge User with a specific input => specific output Scientist multi-omics
 experiment
 output Flask app
  • 17. The Challenge User ”start somewhere -> explore freely knowledge” SemSpect interactive browsing Start from any node Scientist
 or
 Medical
 Doctor
  • 18. The Challenge User with data analysis skills / computer scientist Scientist Start from any node Cypher query language Graph Data Science
  • 19. Use case 1 Handle mapping identifiers of molecular entities Knowledge Graph
  • 20. Query „friends of a friend“ on a gene level 
 Example: diabetes relevant gene ‚TCF7L2’ match path=(g:Gene{sid:'TCF7L2'})-[:MAPS|SYNONYM*0..2]-(g1:Gene) return path
  • 21. Use case 2 Find information that is NOW connected Knowledge Graph
  • 22. Query for SNPs (mutations) associated to diabetes 
 Output: relevant protein and its function (ontology terms) match (tr:Trait) where tr.name contains ‚diabetes mellitus‘ with tr as disease match path=(disease)<-[:ASSOCIATED_WITH_TRAIT]-(asso:Association)<-[:SNP_HAS_ASSOCIATION]-(snp:SNP)- [:SNP_HAS_GENE]-(gene:Gene)-[:MAPS]-(g1:Gene)-[x:CODES]->(transcript:Transcript)-[:CODES]-> (prot:Protein)-[:ASSOCIATION]->(term:Term)—(o:Ontology) return path
  • 23. Use case 3 Using graph algorithms to infer new insights Natural Language Processing 
 Ontologies Knowledge Graph
  • 24. Google’s page rank algorithm - find the most relevant gene 
 finding ACE2 - the receptor the SARS-Cov2 virus uses to enter the cell • 140’000 abstracts from Covid19 related publications • NamedEntityRecognition 
 of gene names • Page Rank identified 
 ‚ACE2‘ as the most relevant 
 gene
  • 25. Who’s this ACE2-guy? source: https://www.benaroyaresearch.org/blog/post/11-things-know-about-mrna-vaccines-covid-19
  • 26. Use case 4 Using node embeddings to sub phenotype diabetic patients Natural
  • 27. DZDconnect connect raw data of diabetic patients with cancer Clinical data from 404 diabetic patients
  • 29. DZDconnect connect transcriptomics fingerprint Transcriptomics experiment with 58’345 specific Transcripts (RNAs)
  • 30. Transform patients Fast random projections (fastRP) CALL gds.fastRP.write ( 'patients' , { embeddingDimension: 50 , writeProperty: 'fastrp- embedding' } ) YIELD nodePropertiesWritten Lipido
  • 31. k-nearest neighbour clustering with k=5 representing the 5 diabetes subtypes patient 01 patient 02 patient 03 Graph
 algorithms patient 04 patient 05 patient 02 p a t i e n t 0 4 patient 03 patient 05 patient 01 subphenotyping of diabetic patients
  • 32. DZDconnect connect patient data with knowledge graph Transcript Gene Synonyms Abstract PubMed 
 Article Keyword 
 MeSH-term Ontology term Hello role-model :-)
  • 33. Take home message • Knowledge graph • as single point of truth • connect in-house data • scalability • infer new insights 
 • Use cases: • simple and advanced (Cypher) queries • Graph Data Science library (page rank, kNN) • Node embeddings for complex data • NLP • Visualization of graph • different users • flask app, browser, SemSpect,…