SlideShare a Scribd company logo
Graphium Chrysalis: Exploiting
Graph Database
Engines to Analyze RDF Graphs
Alejandro Flores
Maria-Esther Vidal
Guillermo Palma
Universidad Simón Bolívar
1Graph-TA 2015
Agenda
 Motivation
 Graphium
 Graph Invariants in Graphium
Graph-TA 2015 2
Resource Description Framework (RDF) Model
3
Subject Object
Predicate
Resource Description Framework (RDF) Model
4
duration
duration
Properties and Relationships are represented as predicates
The Beatles
Let it be
Revolver
Help!
created
1970
35:16
1965
year
1966
35:01
Liverpool
thebeatles.com
Subject Object
Predicate
Source: “Scaling Up Linked Data”.
EUCLID project.
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
SPARQL queries
that represent
Graph patterns
Property Graph Model
6
 Nodes and edges may have properties
 Properties: Key-value pairs
The Beatles
Let it be
Revolver
Help!
created
Year: 1970
Duration: 35:16
Year: 1965
Year: 1966
Duration: 35:01
Homepage:
thebeatles.com
Origin: Liverpool
Source: “Scaling Up Linked Data”.
EUCLID project.
Semantic Data Management
Property
Graphs
Graph Database
Engines
Edges &
Nodes
Neighborhoods
Graph-based
tasks
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
9
Benchmark of Graph
Graph Name #Nodes #Edges Density #Labels
DSJC1000.1
[Johnson91]
1,000 99,258 0.099 1
DSJC1000.5
[Johnson91]
1,000 499,652 0.50 1
DSJC1000.9
[Johnson91]
1,000 898,898 0.899 1
USA-road-
d.NY
264,346 730,100 0.00001045 7,970
USA-road-
d.FLA
1,070,376 2,687,902 0.00000235 22,704
Berlin10M 2,743,235 9,709,119 0.00000129 40
[Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental
evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406.
USA-road-d* Graphs 9th DIMACS Implementation Challenge - Shortest Paths http://www.dis.uniroma1.it/challenge9/download.shtml
Berlin10M: Berlin Bechmark-http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/
COLD 2013
Adjacency Tests
10
Triple Pattern based Tests
K-Hop Tests
11
Semantic Data Management
RDF Graphs
RDF Engines
S P O S OP PSO
POS OSP OPS
Property
Graphs
Graph Database
Engines
SPARQL queries
that represent
Graph patterns
Edges &
Nodes
Neighborhoods
Graph-based
tasks
13
GRAPHIUM
Neo4j Sparksee
Graph-based API RDF-based API
GRAPHIUM: http://graphium.ldc.usb.ve
http://graphium.ldc.usb.ve/
14
15
GRAPHIUM
Neo4j Sparksee
Graph-based API RDF-based API
Data Mining Traversal API
Graph
Invariants
GRAPHIUM: http://graphium.ldc.usb.ve
16
Graph Invariants
17
Invariant Description
Vertex and Edge Count number of vertices and edges in the graph.
Graph Density number of edges in the graph divided by the number
of possible edges in a complete digraph.
Reciprocity Reciprocity measures the extend to which a triple that
relates resources A and B is reciprocated by a another
triple that relates B with A too.
In- and Out-degree Distribution Distribution of the number of in-coming and out-going
edges of the vertices of a graph.
In-coming and Out-going H-index h is the maximum number, such that h vertices have
each at least h in-coming neighbors (resp., out-going
neighbors) in the graph.
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
diseasome:possibleDrug
Drugbank Diseasome
drugbank:possibleDiseaseTarget
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity values less than 1.0 indicates that there are drugs associated with
diseases that do not have their reciprocal link.
Reciprocity: Reciprocal edges indicates stronger relationships between vertices.
Graph invariants
Drugbank Diseasome
diseasome:possibleDrug
drugbank:possibleDiseaseTarget
drugbank:DB00157
drugbank:possibleDiseaseTarget
diseasome:diseases/0
diseasome:diseases/1
diseasome:diseases/4198
…
diseasome:diseases/0
diseasome:possibleDrug
drugbank:DB00157
diseasome:diseases/1
drugbank:DB00157
Reciprocity can be used to determine Data Quality and Completeness
H-Index Sets
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H out-going neighbors.
S5
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 2 is the maximum
number, such that the vertices in F have
each at least 2 out-going neighbors.
S5
F={S1,S2,S3}
3
3
2
H-Index Set In
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where H is the maximum
number, such that the vertices in F have
each at least H in-coming neighbors.
S5
H-Index Set Out
S1 O1
P1
S2 O2
P2
S3 O3
P3
P4
P5
S4
P6
O4
P7
P8
A set F of vertices, where 3 is the maximum
number, such that the vertices in F have
each at least 3 in-coming neighbors.
S5
F={O1,O2,O3}
3
3
3
Graph invariants
SELECT DISTINCT *
WHERE {
?s drugbank:drugCategory <http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugcategory/micronutrient>.
?s drugbank:target ?o.
?o drugbank:drugReference ?o2.
?o drugbank:goClassificationComponent ?o3
}
Drugbank SPARQL endpoint times out
“References and GO annotations of the targets associated with the Micro Nutrient Drugs”
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
2-hop of Micro Nutrient Drugs
Graph invariants
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
Graph invariants
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
10 Drugs have at least
57 out-going links
H-Index Out
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
47 Targets have at least
57 out-going links
H-Index Out
48 Drugs
685 Targets
…
…
…
…
…
…
…
…
…
…
References
GO Terms
6 References have
at least 21 in-coming
links
H-Index In
H-Index Sets can be used to explain query complexity
H-Index Sets to Validate Potential
Novel Associations
H-Index Sets
Network of Targets and Drugs
Targets Drugs
H-Index Sets
34
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
Targets
H-Index Sets
35
A set F of targets, where
H is the maximum
number, such that the
targets in F have each at
least H out-going
neighbors.
A set F of drugs, where H
is the maximum number,
such that the drugs in F
have each at least H in-
coming neighbors.
Targets
Drugs
Set of Targets and Drugs
 900 Drugs, 1,000 Targets and 5,000
Interactions: Nuclear receptor, Gprotein-
coupled receptors (GPCRs), Ion channels, and
Enzymes.
 DrugBank
K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local
models. Bioinformatics, 25(18).2009.
36
GPCR
Drugs 223
Targets 95
Interactions 635
Avg Interaction
per Target
6.68
Avg Interaction
per Drug
2.84
Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
F={hsa:1128, hsa:1129, hsa:146, hsa:147, hsa:148, hsa:150, hsa:151,hsa:152,hsa:153,hsa:154,hsa:155,hsa:1812, hsa:1813, has:3269,has:3356}
Drugbank Drugs in the dataset of
Gprotein-coupled receptors (GPCRs)
H-index Out is 14
15 Targets are in the H-Index Set Out
H-Index Sets
D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets
D02076 hsa:146
D02076 hsa:147
D00604 has:147
Belong to the H-index Set
Associations between Drugs and Targets that are not in Drugbank
Validated in STICTH http://stitch.embl.de/
H-Index Sets can be used to
Validate the Discovered
Associations
H-Index Sets
Visit our website:
http://graphium.ldc.usb.ve/
Conclusions
Graph Invariants:
 Remain the same under two
isomorphic graphs and any
representation.
 Allow for uncovering hidden properties
of the graphs
Reciprocity
Density
H-Index Set
Reciprocity can suggest data
quality and incompleteness.
Density can be used to explain
complexity of graph tasks
H-index sets can comprise
entities useful to discover potential
novel associations.

More Related Content

PDF
Exploiting the query structure for efficient join ordering in SPARQL queries
PDF
Drug Repurposing using Deep Learning on Knowledge Graphs
PDF
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
PPTX
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
PPTX
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
PPTX
FAIR & AI Ready KGs for Explainable Predictions
PDF
AI for automated materials discovery via learning to represent, predict, gene...
PPTX
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Exploiting the query structure for efficient join ordering in SPARQL queries
Drug Repurposing using Deep Learning on Knowledge Graphs
Distributed End-to-End Drug Similarity Analytics and Visualization Workflow w...
How we Learned to Stop Worrying and Solve the Distributed Graph Problem
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
FAIR & AI Ready KGs for Explainable Predictions
AI for automated materials discovery via learning to represent, predict, gene...
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks

Similar to Graphium Chrysalis: Exploiting Graph Database (20)

PPTX
FedCentric_Presentation
PPTX
Graphs in data structures are non-linear data structures made up of a finite ...
PPTX
Biomedical_Knowledge_Graph_Presentation.pptx
PPTX
When Graphs Meet Machine Learning
PPTX
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PPTX
Using Graph Analysis and Fraud Detection in the Fintech Industry
PPTX
Accelerate Pharmaceutical R&D with Big Data and MongoDB
PDF
2015 GU-ICBI Poster (third printing)
PPT
Trends In Graph Data Management And Mining
PPTX
Accelerate pharmaceutical r&d with mongo db
PDF
TinkerPop: a story of graphs, DBs, and graph DBs
PDF
Data Summer Conf 2018, “Analysing Billion Node Graphs (ENG)” — Giorgi Jvaridz...
PPTX
Follow the money with graphs
PDF
1st UIM-GDB - Connections to the Real World
PDF
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
PDF
Introduction to Graph Databases
PPTX
Dexjava Technical Seminar Dec 2011
PDF
Knowledg graphs yosi mass
PDF
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
FedCentric_Presentation
Graphs in data structures are non-linear data structures made up of a finite ...
Biomedical_Knowledge_Graph_Presentation.pptx
When Graphs Meet Machine Learning
Chunlei Wu BD2K 201601 MyGene.info and MyVariant.info
Using Graph Analysis and Fraud Detection in the Fintech Industry
Using Graph Analysis and Fraud Detection in the Fintech Industry
Accelerate Pharmaceutical R&D with Big Data and MongoDB
2015 GU-ICBI Poster (third printing)
Trends In Graph Data Management And Mining
Accelerate pharmaceutical r&d with mongo db
TinkerPop: a story of graphs, DBs, and graph DBs
Data Summer Conf 2018, “Analysing Billion Node Graphs (ENG)” — Giorgi Jvaridz...
Follow the money with graphs
1st UIM-GDB - Connections to the Real World
Drug and Vaccine Discovery: Knowledge Graph + Apache Spark
Introduction to Graph Databases
Dexjava Technical Seminar Dec 2011
Knowledg graphs yosi mass
En un mundo hiperconectado, las bases de datos de grafos son tu arma secreta
Ad

More from Graph-TA (20)

PDF
Computing on Event-sourced Graphs
PDF
Using Evolutionary Computing for Feature-driven Graph generation
PDF
Reactive Databases for Big Data applications
PDF
The scarcity of crossing dependencies: a direct outcome of a specific constra...
PDF
Holistic Benchmarking of Big Linked Data: HOBBIT
PDF
Identifiability in Dynamic Casual Networks
PDF
Polyglot Graph Databases using OCL as pivot
PDF
Benchmarking Versioning for Big Linked Data
PDF
Synthetic Data Generation using exponential random Graph modeling
PDF
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
PDF
Graphalytics: A big data benchmark for graph-processing platforms
PDF
Modelling the Clustering Coefficient of a Random graph
PPTX
RDF Graph Data Management in Oracle Database and NoSQL Platforms
PPTX
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
PPTX
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
PDF
Graphalytics: A big data benchmark for graph processing platforms
PDF
Autograph: an evolving lightweight graph tool
PPTX
Understanding Graph Structure in Knowledge Bases
PDF
Finding patterns of chronic disease and medication prescriptions from a large...
PDF
Recent Updates on IBM System G — GraphBIG and Temporal Data
Computing on Event-sourced Graphs
Using Evolutionary Computing for Feature-driven Graph generation
Reactive Databases for Big Data applications
The scarcity of crossing dependencies: a direct outcome of a specific constra...
Holistic Benchmarking of Big Linked Data: HOBBIT
Identifiability in Dynamic Casual Networks
Polyglot Graph Databases using OCL as pivot
Benchmarking Versioning for Big Linked Data
Synthetic Data Generation using exponential random Graph modeling
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Graphalytics: A big data benchmark for graph-processing platforms
Modelling the Clustering Coefficient of a Random graph
RDF Graph Data Management in Oracle Database and NoSQL Platforms
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
Graphalytics: A big data benchmark for graph processing platforms
Autograph: an evolving lightweight graph tool
Understanding Graph Structure in Knowledge Bases
Finding patterns of chronic disease and medication prescriptions from a large...
Recent Updates on IBM System G — GraphBIG and Temporal Data
Ad

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Big Data Technologies - Introduction.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Spectral efficient network and resource selection model in 5G networks
MYSQL Presentation for SQL database connectivity
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
Big Data Technologies - Introduction.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
MIND Revenue Release Quarter 2 2025 Press Release
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Electronic commerce courselecture one. Pdf
sap open course for s4hana steps from ECC to s4
Spectral efficient network and resource selection model in 5G networks

Graphium Chrysalis: Exploiting Graph Database

  • 1. Graphium Chrysalis: Exploiting Graph Database Engines to Analyze RDF Graphs Alejandro Flores Maria-Esther Vidal Guillermo Palma Universidad Simón Bolívar 1Graph-TA 2015
  • 2. Agenda  Motivation  Graphium  Graph Invariants in Graphium Graph-TA 2015 2
  • 3. Resource Description Framework (RDF) Model 3 Subject Object Predicate
  • 4. Resource Description Framework (RDF) Model 4 duration duration Properties and Relationships are represented as predicates The Beatles Let it be Revolver Help! created 1970 35:16 1965 year 1966 35:01 Liverpool thebeatles.com Subject Object Predicate Source: “Scaling Up Linked Data”. EUCLID project.
  • 5. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS SPARQL queries that represent Graph patterns
  • 6. Property Graph Model 6  Nodes and edges may have properties  Properties: Key-value pairs The Beatles Let it be Revolver Help! created Year: 1970 Duration: 35:16 Year: 1965 Year: 1966 Duration: 35:01 Homepage: thebeatles.com Origin: Liverpool Source: “Scaling Up Linked Data”. EUCLID project.
  • 7. Semantic Data Management Property Graphs Graph Database Engines Edges & Nodes Neighborhoods Graph-based tasks
  • 8. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS Property Graphs Graph Database Engines SPARQL queries that represent Graph patterns Edges & Nodes Neighborhoods Graph-based tasks
  • 9. 9 Benchmark of Graph Graph Name #Nodes #Edges Density #Labels DSJC1000.1 [Johnson91] 1,000 99,258 0.099 1 DSJC1000.5 [Johnson91] 1,000 499,652 0.50 1 DSJC1000.9 [Johnson91] 1,000 898,898 0.899 1 USA-road- d.NY 264,346 730,100 0.00001045 7,970 USA-road- d.FLA 1,070,376 2,687,902 0.00000235 22,704 Berlin10M 2,743,235 9,709,119 0.00000129 40 [Johnson91] Johnson, D., Aragon, C., McGeoch, L., and Schevon, C. Optimization by simulated annealing: an experimental evaluation; part ii, graph coloring and number partitioning. Operations research 39, 3 (1991), 378–406. USA-road-d* Graphs 9th DIMACS Implementation Challenge - Shortest Paths http://www.dis.uniroma1.it/challenge9/download.shtml Berlin10M: Berlin Bechmark-http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/ COLD 2013
  • 12. Semantic Data Management RDF Graphs RDF Engines S P O S OP PSO POS OSP OPS Property Graphs Graph Database Engines SPARQL queries that represent Graph patterns Edges & Nodes Neighborhoods Graph-based tasks
  • 13. 13 GRAPHIUM Neo4j Sparksee Graph-based API RDF-based API GRAPHIUM: http://graphium.ldc.usb.ve
  • 15. 15 GRAPHIUM Neo4j Sparksee Graph-based API RDF-based API Data Mining Traversal API Graph Invariants GRAPHIUM: http://graphium.ldc.usb.ve
  • 16. 16
  • 17. Graph Invariants 17 Invariant Description Vertex and Edge Count number of vertices and edges in the graph. Graph Density number of edges in the graph divided by the number of possible edges in a complete digraph. Reciprocity Reciprocity measures the extend to which a triple that relates resources A and B is reciprocated by a another triple that relates B with A too. In- and Out-degree Distribution Distribution of the number of in-coming and out-going edges of the vertices of a graph. In-coming and Out-going H-index h is the maximum number, such that h vertices have each at least h in-coming neighbors (resp., out-going neighbors) in the graph.
  • 18. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants diseasome:possibleDrug Drugbank Diseasome drugbank:possibleDiseaseTarget
  • 19. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants Drugbank Diseasome diseasome:possibleDrug drugbank:possibleDiseaseTarget drugbank:DB00157 drugbank:possibleDiseaseTarget diseasome:diseases/0 diseasome:diseases/1 diseasome:diseases/4198 … diseasome:diseases/0 diseasome:possibleDrug drugbank:DB00157 diseasome:diseases/1 drugbank:DB00157 Reciprocity values less than 1.0 indicates that there are drugs associated with diseases that do not have their reciprocal link.
  • 20. Reciprocity: Reciprocal edges indicates stronger relationships between vertices. Graph invariants Drugbank Diseasome diseasome:possibleDrug drugbank:possibleDiseaseTarget drugbank:DB00157 drugbank:possibleDiseaseTarget diseasome:diseases/0 diseasome:diseases/1 diseasome:diseases/4198 … diseasome:diseases/0 diseasome:possibleDrug drugbank:DB00157 diseasome:diseases/1 drugbank:DB00157 Reciprocity can be used to determine Data Quality and Completeness
  • 22. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where H is the maximum number, such that the vertices in F have each at least H out-going neighbors. S5
  • 23. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where 2 is the maximum number, such that the vertices in F have each at least 2 out-going neighbors. S5 F={S1,S2,S3} 3 3 2
  • 24. H-Index Set In S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where H is the maximum number, such that the vertices in F have each at least H in-coming neighbors. S5
  • 25. H-Index Set Out S1 O1 P1 S2 O2 P2 S3 O3 P3 P4 P5 S4 P6 O4 P7 P8 A set F of vertices, where 3 is the maximum number, such that the vertices in F have each at least 3 in-coming neighbors. S5 F={O1,O2,O3} 3 3 3
  • 26. Graph invariants SELECT DISTINCT * WHERE { ?s drugbank:drugCategory <http://wifo5-04.informatik.uni-mannheim.de/drugbank/resource/drugcategory/micronutrient>. ?s drugbank:target ?o. ?o drugbank:drugReference ?o2. ?o drugbank:goClassificationComponent ?o3 } Drugbank SPARQL endpoint times out “References and GO annotations of the targets associated with the Micro Nutrient Drugs”
  • 27. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 2-hop of Micro Nutrient Drugs Graph invariants
  • 29. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 10 Drugs have at least 57 out-going links H-Index Out
  • 30. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 47 Targets have at least 57 out-going links H-Index Out
  • 31. 48 Drugs 685 Targets … … … … … … … … … … References GO Terms 6 References have at least 21 in-coming links H-Index In H-Index Sets can be used to explain query complexity
  • 32. H-Index Sets to Validate Potential Novel Associations
  • 33. H-Index Sets Network of Targets and Drugs Targets Drugs
  • 34. H-Index Sets 34 A set F of targets, where H is the maximum number, such that the targets in F have each at least H out-going neighbors. Targets
  • 35. H-Index Sets 35 A set F of targets, where H is the maximum number, such that the targets in F have each at least H out-going neighbors. A set F of drugs, where H is the maximum number, such that the drugs in F have each at least H in- coming neighbors. Targets Drugs
  • 36. Set of Targets and Drugs  900 Drugs, 1,000 Targets and 5,000 Interactions: Nuclear receptor, Gprotein- coupled receptors (GPCRs), Ion channels, and Enzymes.  DrugBank K. Bleakley and Y. Yamanishi. Supervised prediction of drug target interactions using bipartite local models. Bioinformatics, 25(18).2009. 36 GPCR Drugs 223 Targets 95 Interactions 635 Avg Interaction per Target 6.68 Avg Interaction per Drug 2.84
  • 37. Drugbank Drugs in the dataset of Gprotein-coupled receptors (GPCRs) H-index Out is 14 15 Targets are in the H-Index Set Out F={hsa:1128, hsa:1129, hsa:146, hsa:147, hsa:148, hsa:150, hsa:151,hsa:152,hsa:153,hsa:154,hsa:155,hsa:1812, hsa:1813, has:3269,has:3356}
  • 38. Drugbank Drugs in the dataset of Gprotein-coupled receptors (GPCRs) H-index Out is 14 15 Targets are in the H-Index Set Out
  • 40. D02076 hsa:146 D02076 hsa:147 D00604 has:147 Belong to the H-index Set Associations between Drugs and Targets that are not in Drugbank Validated in STICTH http://stitch.embl.de/ H-Index Sets
  • 41. D02076 hsa:146 D02076 hsa:147 D00604 has:147 Belong to the H-index Set Associations between Drugs and Targets that are not in Drugbank Validated in STICTH http://stitch.embl.de/ H-Index Sets can be used to Validate the Discovered Associations H-Index Sets
  • 43. Conclusions Graph Invariants:  Remain the same under two isomorphic graphs and any representation.  Allow for uncovering hidden properties of the graphs Reciprocity Density H-Index Set Reciprocity can suggest data quality and incompleteness. Density can be used to explain complexity of graph tasks H-index sets can comprise entities useful to discover potential novel associations.