SlideShare a Scribd company logo
THoSP: an Algorithm for Nesting Property
Graphs
Giacomo Bergami 1 André Petermann 2 Danilo Montesi 1
1st Joint GRADES-NDA International Workshop, 2018
10th June 2018
Università di Bologna1, Universität Leipzig2
Key Ideas
Key Ideas – Research Problem
1 An operator allowing to generalize the current “grouping” and
“nesting” is missing. Nevertheless, current (G)DBMSs allow to
express nesting operations, but their query languages’ plans do
not allow to optimize the whole process by combining the
following tasks:
• path joins separately for both patterns.
• grouping to create an id collection over the matched elements.
2 The general nesting algorithm could lead to an exponential
evaluation time.
1/16
Key Ideas – Use Case
Author Paper∗authorOf
Vertex Pattern
Authorsrc Paper∗
Authorsrc =Authordst
Authordst
authorOf authorOf
Edge Pattern
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining Graphs
3
Paper
title : Object Databases
4
Paper
title : On Nesting Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Input Bibliography Network 2/16
Key Ideas – Desired Result
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Paper
title : On Nesting
Graphs
5
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
coauthorship coauthorship
(1)
Expected result
3/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
4/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph
summarization tasks.
4/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph
summarization tasks.
3 Grouping can be avoided by defining a nesting index, through
which the containment is associated to the container. This can
be achieved by extending the Graph Join’s data structures with
the aforementioned data structure.
4/16
Logical Model
Logical Model – Design (1)
The nested (property) graph data model is an extension of the
logical model for graph joins. Therefore, we want to preserve the
same assumptions:
The resulting nested graph is not a materialized view (as in
SQL’s SELECT).
The nested graph is serialized by only using the ID information.
Attribute, values and labels can be completely reconstructed
from these informations and the pattern rewriting information.
5/16
Logical Model – Design (2)
The following modelling choices allow the reconstruction of the
required pieces of information:
Vertices and edges are distinctly identified by ids (N2).
A nested graph database is a property graph, where each vertex
and edge may contain (nest) another property graph (ν, ).
Each vertex or edge within the graph can be considered as a
possible graph operand.
6/16
Logical Model – Definition
Graph Nesting
A nested graph database is a nested graph, where each vertex and edge may
represent a graph. Given a nested graph G = (V, E), a vertex pattern gV, a
edge pattern gE vertex pattern containing grouping references:
η
keep
ι (G) = { v ∈ V | gV(v) = ∅, keep } ∪ ι(gV(G)),
{ e ∈ E | gE(e) = ∅, keep } ∪ ι(gE(G))
where ι is an indexing function associating to each matched graph into one
new single identifier not appearing in G, and keep is set to true whether
the non-traversed vertices and edges must be preserved into the final graph.
The newly generated nested graph is inserted into the graph database which
also contains G. Values associated to both nested vertices and edges are
determined by user defined functions.
7/16
THoSP Algorithm
THoSP Algorithm – Physical Model
Motivations:
1 Reduce the number of graph visiting times by visiting the
subpattern first, and then extending the visit to the remaining
patterns.
2 Represent the nested graph as an adjacency list enriched with
an external nesting index.
The algorithm uses the same principles that were adopted for
implementing graph joins:
Use memory mapping (OS buffering).
Serialized graphs represent vertices associated to both ingoing
and outgoing edges.
No additional indexing structures are exploited.
8/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
Author
name : Cassie
surname : Norman
2
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
(1)
Author
name : Cassie
surname : Norman
2
coauthorship
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
(1)
Author
name : Cassie
surname : Norman
2
coauthorship
coauthorship
9/16
THoSP Algorithm – Example
Author
name : Abigail
surname : Conner
0
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
Paper
title : On Nesting
Graphs
5
AuthorOf
6
AuthorOf
7
AuthorOf
8
AuthorOf
9
AuthorOf
10
Paper
title : On Joining
Graphs
3
Paper
title : Object
Databases
4
(0 → 1), (1 → 0)
Paper
title : On Nesting
Graphs
5
Author
name : Abigail
surname : Conner
0
(0)
(0 → 2), (2 → 0)
(2)
Author
name : Baldwin
surname : Oliver
1
Author
name : Cassie
surname : Norman
2
coauthorship
coauthorship
(1)
9/16
Experimental Evaluation
Experimental Evaluation – Dataset
We want to show that the combination of THoSP with the proposed
physical data model outperforms the query plans for other query
languages (Cypher, SPARQL, SQL, AQL).
We performed our tests on both synthetic and real world data, using
n = 1 ÷ 8 operands with vertex size 10n:
• GMark graph generator.
• Random samples of Microsoft Academic Graph.
Our tests’ source code is available at:
https://bitbucket.org/unibogb/graphnestingc/src
10/16
Experimental Evaluation – Competing DataBases
Given that the only graph database using Java was the the worst
performing one, we implemented our solution only in C++ The
graph nesting operator was implemented in each DB language by
redurning ID collections.
• PostgreSQL was used to evaluate SQL queries. We ran the
queries directly in psql.
• SPARQL queries were evaluated over Virtuoso. SPARQL
queries were send via ODBC (C++).
• Cypher queries were evaluated over Neo4J. SPARQL queries
were send via the execute method.
• AQL queries were evaluated over ArangoDB. We ran the
queries directly in arangosh.
11/16
Experimental Evaluation – GMark Benchmark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms)
|V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP
10 3 2.10 11 15.00 681.40 0.11
102 58 9.68 63 3.89 1,943.98 0.14
103 968 17.96 63 12.34 >3.60×106 0.46
104 8, 683 69.27 364 46.74 >3.60×106 4.07
105 88, 885 294.23 4,153 508.87 >3.60×106 43.81
106 902, 020 2,611.48 50,341 7,212.19 >3.60×106 563.02
107 8, 991, 417 25,666.14 672,273 922,590.00 >3.60×106 8,202.93
108 89, 146, 891 396,523.88 >3.60×106 >3.60×106 >3.60×106 91,834.20
12/16
Experimental Evaluation – Microsoft Academic Graph Bench-
mark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms)
|V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP
10 19 1.69·100 3.4·101 6.57·10−1 2.38·103 2.82·10−1
102 255 1.75·100 3.22·102 2.51·100 1.01·104 3.46·10−1
103 23,119 4.71·101 1.22·103 8.18·101 >1H 1.39·101
104 5,411,205 1.53·104 2.77·105 2.08·104 >1H 2.58·103
105 97,079,329 1.20·106 >1H OOM1 >1H 1.97·105
106 241,448,529 >1H >1H OOM1 >1H 6.22·105
107 361,759,509 OOM2 >1H OOM1 >1H 7.74·105
13/16
Experimental Evaluation – Results
• This further benchmarks shows that all the current data model
supporting nested representation do not support query plans
allowing for a specific case of (graph) nesting.
• The proposed approach extended the secondary memory’s
property graph representation by adding associations to nested
vertices and edges.
• The serialized data structure provides a graph having an
external containment data structure.
• This data model achieves structural aggregation for graph data,
where aggregated data may preserve the original vertices and
edges.
14/16
Experimental Evaluation – Further Results
GROQ: THoSP can be generalized into a more general
algorithm.
Generalized Semistructured Model: This data structure can be
generalized into a broader data representation.
15/16
Experimental Evaluation – Future Work
GROQ: Further benchmarks have to be carried out over this
more general general nesting algorithm.
General Nesting: Provide a query plan where either grouping or
GROQ are used.
16/16
Backup Slides
Backup Slides – Nested Graph Database
Nested Graph DataBase
Given a set Σ∗ of strings, a nested (property) graph database G is a tuple
G = V, E, λ, , ω, ν, , where:
• V, E ∈ N2 s.t. V ∩ E = ∅
• source and target λ: E → V2.
• labelling : V ∪ E → ℘(Σ∗)
• object mapping ω : V ∪ E → Ω
• vertices’ containment: ν: (V ∪ E) → ℘(V)
• edges’ containment: : (V ∪ E) → ℘(E)
Each vertex or edge o ∈ V ∪ E induces a nested (property) graph as the
following pair:
Go = ν(o), e ∈ (o) λ(e) ∈ (∪n≥0 ν (n)
({o}))2
THoSP Pseudocode
nest ( Cont , patt , u , S ) :
for each s in S s . t . patt . d o S e r i a l i z e ( s ) :
Cont . write ( <u , s >)
Input : G, gV , gE
Cont ← ∅
NestedGraph ← ∅
a ← V ∩ E  ( γV ∪ γsrc
E ∪ γdst
E ) ;
for each v : v e r t e x in G s . t . a ( v ) :
for each V( u →e v ) :
u : = d t l ( u ) c ; nest ( Cont , V , u , { u , e , v } )
NGraph (V) ← NGraph (V) ∪ { u }
for each V(w →e v ) s . t . E ( u →e ve ←w)
w : = d t l (w) c ;
e’ : = d t l ( u ,w) c ;
nest ( Cont , E , e’ , { u , e , v , e ' ,w} )
NGraph ( E ) ← NGraph ( E ) ∪ { u →e’ w }

More Related Content

PDF
A Join Operator for Property Graphs
PPTX
Rehashing
PPTX
Hashing Technique In Data Structures
PPTX
PPT
Hashing PPT
PPT
Extensible hashing
PPT
Hashing
PPTX
11. Hashing - Data Structures using C++ by Varsha Patil
A Join Operator for Property Graphs
Rehashing
Hashing Technique In Data Structures
Hashing PPT
Extensible hashing
Hashing
11. Hashing - Data Structures using C++ by Varsha Patil

What's hot (20)

PPT
4.4 external hashing
PPT
Ch17 Hashing
PPTX
Stack and Hash Table
PPT
Concept of hashing
PPTX
Hashing
PPT
Data Structure and Algorithms Hashing
PDF
Hash Tables in data Structure
PPT
Open Addressing on Hash Tables
PDF
Hashing and Hash Tables
PPTX
C programming
PDF
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
PPTX
Hashing In Data Structure
PPTX
Hashing 1
ZIP
Hashing
PPT
Hashing gt1
PPTX
Hashing Techniques in Data Structures Part2
PDF
DBMS 9 | Extendible Hashing
PDF
Tutorial 9 (bloom filters)
DOCX
PPTX
Hashing data
4.4 external hashing
Ch17 Hashing
Stack and Hash Table
Concept of hashing
Hashing
Data Structure and Algorithms Hashing
Hash Tables in data Structure
Open Addressing on Hash Tables
Hashing and Hash Tables
C programming
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing In Data Structure
Hashing 1
Hashing
Hashing gt1
Hashing Techniques in Data Structures Part2
DBMS 9 | Extendible Hashing
Tutorial 9 (bloom filters)
Hashing data
Ad

Similar to THoSP: an Algorithm for Nesting Property Graphs (20)

PDF
Neo4j MeetUp - Graph Exploration with MetaExp
PDF
abookthatiwanttoshareonmediumformyproject
PDF
Scaling up genomic analysis with ADAM
PPTX
Mapping Graph Queries to PostgreSQL
PDF
Learning Commonalities in RDF
PDF
aRangodb, un package per l'utilizzo di ArangoDB con R
PPTX
#8 Graph Analytics in Machine Learning.pptx
DOCX
Planted Clique Research Paper
PDF
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
PPTX
Compact Representation of Large RDF Data Sets for Publishing and Exchange
PDF
Bekas for cognitive_speaker_series
PDF
Bekas for cognitive_speaker_series
PPTX
250317_Thuy_Labseminar[GLAD: Improving Latent Graph Generative Modeling with ...
PDF
50120130406008
PPTX
Apache spark on planet scale
PDF
Scalable and Adaptive Graph Querying with MapReduce
PDF
Learning Commonalities in RDF and SPARQL
PDF
Text categorization as graph
PDF
Text categorization as a graph
PDF
Text categorization
Neo4j MeetUp - Graph Exploration with MetaExp
abookthatiwanttoshareonmediumformyproject
Scaling up genomic analysis with ADAM
Mapping Graph Queries to PostgreSQL
Learning Commonalities in RDF
aRangodb, un package per l'utilizzo di ArangoDB con R
#8 Graph Analytics in Machine Learning.pptx
Planted Clique Research Paper
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
Compact Representation of Large RDF Data Sets for Publishing and Exchange
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
250317_Thuy_Labseminar[GLAD: Improving Latent Graph Generative Modeling with ...
50120130406008
Apache spark on planet scale
Scalable and Adaptive Graph Querying with MapReduce
Learning Commonalities in RDF and SPARQL
Text categorization as graph
Text categorization as a graph
Text categorization
Ad

Recently uploaded (20)

PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Construction Project Organization Group 2.pptx
PDF
Well-logging-methods_new................
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPT
Project quality management in manufacturing
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Lecture Notes Electrical Wiring System Components
Construction Project Organization Group 2.pptx
Well-logging-methods_new................
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
bas. eng. economics group 4 presentation 1.pptx
Sustainable Sites - Green Building Construction
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
R24 SURVEYING LAB MANUAL for civil enggi
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Project quality management in manufacturing
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT 4 Total Quality Management .pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
Internet of Things (IOT) - A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx

THoSP: an Algorithm for Nesting Property Graphs

  • 1. THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 André Petermann 2 Danilo Montesi 1 1st Joint GRADES-NDA International Workshop, 2018 10th June 2018 Università di Bologna1, Universität Leipzig2
  • 3. Key Ideas – Research Problem 1 An operator allowing to generalize the current “grouping” and “nesting” is missing. Nevertheless, current (G)DBMSs allow to express nesting operations, but their query languages’ plans do not allow to optimize the whole process by combining the following tasks: • path joins separately for both patterns. • grouping to create an id collection over the matched elements. 2 The general nesting algorithm could lead to an exponential evaluation time. 1/16
  • 4. Key Ideas – Use Case Author Paper∗authorOf Vertex Pattern Authorsrc Paper∗ Authorsrc =Authordst Authordst authorOf authorOf Edge Pattern Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Input Bibliography Network 2/16
  • 5. Key Ideas – Desired Result Paper title : On Joining Graphs 3 Paper title : Object Databases 4 (0 → 1), (1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) (2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship (1) Expected result 3/16
  • 6. Key Ideas – Research Goals 1 As for graph joins, the data model must enhance the serialization of both operands and graph result. 4/16
  • 7. Key Ideas – Research Goals 1 As for graph joins, the data model must enhance the serialization of both operands and graph result. 2 The logical graph nesting operator must be general enough to support both the THoSP algorithm and other graph summarization tasks. 4/16
  • 8. Key Ideas – Research Goals 1 As for graph joins, the data model must enhance the serialization of both operands and graph result. 2 The logical graph nesting operator must be general enough to support both the THoSP algorithm and other graph summarization tasks. 3 Grouping can be avoided by defining a nesting index, through which the containment is associated to the container. This can be achieved by extending the Graph Join’s data structures with the aforementioned data structure. 4/16
  • 10. Logical Model – Design (1) The nested (property) graph data model is an extension of the logical model for graph joins. Therefore, we want to preserve the same assumptions: The resulting nested graph is not a materialized view (as in SQL’s SELECT). The nested graph is serialized by only using the ID information. Attribute, values and labels can be completely reconstructed from these informations and the pattern rewriting information. 5/16
  • 11. Logical Model – Design (2) The following modelling choices allow the reconstruction of the required pieces of information: Vertices and edges are distinctly identified by ids (N2). A nested graph database is a property graph, where each vertex and edge may contain (nest) another property graph (ν, ). Each vertex or edge within the graph can be considered as a possible graph operand. 6/16
  • 12. Logical Model – Definition Graph Nesting A nested graph database is a nested graph, where each vertex and edge may represent a graph. Given a nested graph G = (V, E), a vertex pattern gV, a edge pattern gE vertex pattern containing grouping references: η keep ι (G) = { v ∈ V | gV(v) = ∅, keep } ∪ ι(gV(G)), { e ∈ E | gE(e) = ∅, keep } ∪ ι(gE(G)) where ι is an indexing function associating to each matched graph into one new single identifier not appearing in G, and keep is set to true whether the non-traversed vertices and edges must be preserved into the final graph. The newly generated nested graph is inserted into the graph database which also contains G. Values associated to both nested vertices and edges are determined by user defined functions. 7/16
  • 14. THoSP Algorithm – Physical Model Motivations: 1 Reduce the number of graph visiting times by visiting the subpattern first, and then extending the visit to the remaining patterns. 2 Represent the nested graph as an adjacency list enriched with an external nesting index. The algorithm uses the same principles that were adopted for implementing graph joins: Use memory mapping (OS buffering). Serialized graphs represent vertices associated to both ingoing and outgoing edges. No additional indexing structures are exploited. 8/16
  • 15. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 9/16
  • 16. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 9/16
  • 17. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner 0 (0) 9/16
  • 18. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner 0 (0) Author name : Cassie surname : Norman 2 9/16
  • 19. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) Author name : Cassie surname : Norman 2 coauthorship 9/16
  • 20. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) (2) Author name : Cassie surname : Norman 2 coauthorship 9/16
  • 21. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) (2) Author name : Baldwin surname : Oliver 1 (1) Author name : Cassie surname : Norman 2 coauthorship 9/16
  • 22. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 (0 → 1), (1 → 0) Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) (2) Author name : Baldwin surname : Oliver 1 (1) Author name : Cassie surname : Norman 2 coauthorship coauthorship 9/16
  • 23. THoSP Algorithm – Example Author name : Abigail surname : Conner 0 Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 (0 → 1), (1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner 0 (0) (0 → 2), (2 → 0) (2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship (1) 9/16
  • 25. Experimental Evaluation – Dataset We want to show that the combination of THoSP with the proposed physical data model outperforms the query plans for other query languages (Cypher, SPARQL, SQL, AQL). We performed our tests on both synthetic and real world data, using n = 1 ÷ 8 operands with vertex size 10n: • GMark graph generator. • Random samples of Microsoft Academic Graph. Our tests’ source code is available at: https://bitbucket.org/unibogb/graphnestingc/src 10/16
  • 26. Experimental Evaluation – Competing DataBases Given that the only graph database using Java was the the worst performing one, we implemented our solution only in C++ The graph nesting operator was implemented in each DB language by redurning ID collections. • PostgreSQL was used to evaluate SQL queries. We ran the queries directly in psql. • SPARQL queries were evaluated over Virtuoso. SPARQL queries were send via ODBC (C++). • Cypher queries were evaluated over Neo4J. SPARQL queries were send via the execute method. • AQL queries were evaluated over ArangoDB. We ran the queries directly in arangosh. 11/16
  • 27. Experimental Evaluation – GMark Benchmark Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 3 2.10 11 15.00 681.40 0.11 102 58 9.68 63 3.89 1,943.98 0.14 103 968 17.96 63 12.34 >3.60×106 0.46 104 8, 683 69.27 364 46.74 >3.60×106 4.07 105 88, 885 294.23 4,153 508.87 >3.60×106 43.81 106 902, 020 2,611.48 50,341 7,212.19 >3.60×106 563.02 107 8, 991, 417 25,666.14 672,273 922,590.00 >3.60×106 8,202.93 108 89, 146, 891 396,523.88 >3.60×106 >3.60×106 >3.60×106 91,834.20 12/16
  • 28. Experimental Evaluation – Microsoft Academic Graph Bench- mark Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 19 1.69·100 3.4·101 6.57·10−1 2.38·103 2.82·10−1 102 255 1.75·100 3.22·102 2.51·100 1.01·104 3.46·10−1 103 23,119 4.71·101 1.22·103 8.18·101 >1H 1.39·101 104 5,411,205 1.53·104 2.77·105 2.08·104 >1H 2.58·103 105 97,079,329 1.20·106 >1H OOM1 >1H 1.97·105 106 241,448,529 >1H >1H OOM1 >1H 6.22·105 107 361,759,509 OOM2 >1H OOM1 >1H 7.74·105 13/16
  • 29. Experimental Evaluation – Results • This further benchmarks shows that all the current data model supporting nested representation do not support query plans allowing for a specific case of (graph) nesting. • The proposed approach extended the secondary memory’s property graph representation by adding associations to nested vertices and edges. • The serialized data structure provides a graph having an external containment data structure. • This data model achieves structural aggregation for graph data, where aggregated data may preserve the original vertices and edges. 14/16
  • 30. Experimental Evaluation – Further Results GROQ: THoSP can be generalized into a more general algorithm. Generalized Semistructured Model: This data structure can be generalized into a broader data representation. 15/16
  • 31. Experimental Evaluation – Future Work GROQ: Further benchmarks have to be carried out over this more general general nesting algorithm. General Nesting: Provide a query plan where either grouping or GROQ are used. 16/16
  • 33. Backup Slides – Nested Graph Database Nested Graph DataBase Given a set Σ∗ of strings, a nested (property) graph database G is a tuple G = V, E, λ, , ω, ν, , where: • V, E ∈ N2 s.t. V ∩ E = ∅ • source and target λ: E → V2. • labelling : V ∪ E → ℘(Σ∗) • object mapping ω : V ∪ E → Ω • vertices’ containment: ν: (V ∪ E) → ℘(V) • edges’ containment: : (V ∪ E) → ℘(E) Each vertex or edge o ∈ V ∪ E induces a nested (property) graph as the following pair: Go = ν(o), e ∈ (o) λ(e) ∈ (∪n≥0 ν (n) ({o}))2
  • 34. THoSP Pseudocode nest ( Cont , patt , u , S ) : for each s in S s . t . patt . d o S e r i a l i z e ( s ) : Cont . write ( <u , s >) Input : G, gV , gE Cont ← ∅ NestedGraph ← ∅ a ← V ∩ E ( γV ∪ γsrc E ∪ γdst E ) ; for each v : v e r t e x in G s . t . a ( v ) : for each V( u →e v ) : u : = d t l ( u ) c ; nest ( Cont , V , u , { u , e , v } ) NGraph (V) ← NGraph (V) ∪ { u } for each V(w →e v ) s . t . E ( u →e ve ←w) w : = d t l (w) c ; e’ : = d t l ( u ,w) c ; nest ( Cont , E , e’ , { u , e , v , e ' ,w} ) NGraph ( E ) ← NGraph ( E ) ∪ { u →e’ w }