SlideShare a Scribd company logo
www.moving-project.eu
TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation
Till Blume and Ansgar Scherp
ZBW – Leibniz Information Centre for Economics
Christian-Albrechts-Universitat zu Kiel
Towards Flexible Indices for
Distributed Graph Data:
The Formal Schema-level Index Model FLuID
May 23rd, 2018, 30th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken),
22.05.2018 - 25.05.2018, Wuppertal, Germany.
www.moving-project.eu
2 of 17
Why use a Schema-level Index?
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
?!
I want more
metadata!
Where to
get it from? …
2
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
Problem:
• We are looking for a specific kind of metadata, e.g., about books.
• We do not know in which databases we can find such metadata.
• We need an index that can be queried to find matching databases.
Solution:
• A schema-level index (SLI) summarizes data by storing information of how the data is
modelled in a specific database.
• We formulate a structural query to find matching databases.
www.moving-project.eu
3 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
…
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
MOVING
platform
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
2
MOVING search scenario:
• The MOVING platform1 provides a search for bibliographic resources
• We harvest bibliographic metadata using different SLIs
• Such metadata is of great value since
• We can obtain good search results solely relying on the title [3].
• We can complement existing metadata.
• We can train machine learning models to further improve the search [4].
1http://platform.moving-project.eu
3
www.moving-project.eu
4 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
…
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
MOVING
platform
Index
1
foaf:Agent
dct:subject
bibo:Book
dct:creator
2
MOVING search scenario:
• Which SLIs are best suited to find bibliographic metadata in the Web of Data?
• Can we find semantically similar databases as well?
Proceedings of the …
Benjamin Elizalde
foaf:Agent
URI-9
URI-8
bibo:Proceedings
dct:subject
URI-6
3
www.moving-project.eu
5 of 17
• All schema-level indices (SLI) summarize data differently, for different
purposes, and lack a common formalization [1,2,5,7-11], for example:
• Consider incoming and outgoing properties (edges)
• Consider properties (edge label) and objects (target node)
• Consider types
• Consider types and properties
• …
• Without a common ground, it is difficult to develop new indices and compare
them to existing ones.
• Even for a single application scenario, a single SLI may not be sufficient since
how the data is modelled can vary a lot [6].
Motivation for FLuID
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
6 of 17
Approach
• Abstract from the Related Work (Bottom-up): Find generic, simple patterns in
existing SLIs and use them as basic building blocks to define all (complex)
schema structures that exist in previous SLIs.
• MOVING search scenario (Top-down): Flexible define indices that can reflect
semantic information and can be efficiently computed.
Solution
1. We formalized our building blocks using equivalence relations over directed
edge labeled multigraph (RDF graph).
2. We demonstrated how to model existing works and beyond.
3. We showed the scalability by conducting a complexity analysis.
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
7 of 17
• FLuID provides 7 schema elements:
• 3 simple elements: Object Cluster (OC), Property Cluster (PC), and Property-
Object Cluster (POC)
• 3 undirected elements: u-OC, u-PC, and u-POC
• 1 Complex Schema Element (CSE)
• FLuID provides 4 parameterizations:
• Label parameterization
• Chaining parameterization
• Ontology paramaterization
• Instance parameterization
• In total, FLuID provides 11 building blocks sufficient to model all
existing approaches and beyond.
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
8 of 17
• Instances: edges <s,p,o> with same subject node s, i.e.,
((i1, p1, o1), (i2, p2, o2)) ∈ I ⇔ i1 = i2.
• Edges belong to exactly 1 instance, nodes not necessarily
• Since instances partition the data graph, a set of instances also partitions the
data graph.
FLuID: Equivalence Relation Approach
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
9 of 17
• Object Cluster: summarize instances that share a set of connected objects, i.e.,
([i1]I , [i2]I ) ∈ OC ⇔ ∀(i1, p1, o1)∃(i2, p2, o2) : o1 = o2 ∧
∀(i2, p2, o2) ∃(i1, p1, o1) : o1 = o2
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
10 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is p1
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
p1
p2
p1
p3
p2
p1
www.moving-project.eu
11 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
rdf:type
p2
rdf:type
p3
p2
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
12 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
• Ontology paramaterization: RDFS Schema Graph
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
p2
rdf:type
p2
rdf:type
p3
p2
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
13 of 17
• Label Parameterized Object Cluster: summarize instances that have the set of
connected objects, if the property is rdf:type
• Ontology paramaterization: RDFS Schema Graph
• Instance parameterization: owl:sameAs
The FLuID Model
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
i1
i2 i3
i4
i5
i6
i7
i8
i9
i10
dct:creator
rdf:type
dct:creator
rdf:type
owl:sameAs
dct:creator
rdf:type
Bbibo:Book
Bfoaf:Agent
Bbibo:Proceedings
www.moving-project.eu
14 of 17
A Semantic Schema-level Index
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
Index
1foaf:Agent
dct:subject
bibo:Book
dct:creator …
2
Proceedings of the …
Benjamin Elizalde
foaf:Agent
URI-9 URI-8
bibo:Proceedings
dct:subject
URI-6
Towards a clean air policy
Great Britain. Central Electricity
foaf:Agent
URI-1 URI-2
bibo:Book
dct:subject
URI-3
Family planning programmes in Africa
dct:creator
Pierre Prader
URI-0
bibo:Book
dct:subject
URI-3 URI-4 URI-5
owl:sameAs
Pierre Prader
URI-5
foaf:Agent
www.moving-project.eu
15 of 17
• Complexity Analysis
• We show that every SLI modeled with FLuID can be computed in O(n).
• Threat: The on-the-fly inferencing! If there was a linear dependency of RDFS
triples and dataset size, we would have quadratic complexity.
• Empirical Evaluation to estimate impact of inferencing
• We analyzed two real-world datasets from the Web of Data.
• TimBL-11M: 11 million triples (edges) crawled from one seed URI.
• DyLDO-127M: 127 million triples (edges) crawled from 95,000 seed URIs.
• Practical impact of the on-the-fly inferencing: g < 1.001.
• Thus, we did not find a linear dependency but rather a constant factor.
Evaluation
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
16 of 17
• Conclusion
• We have presented the novel, parameterized schema-level index model
FLuID, which is sufficient to express the functionalities of existing SLIs and
beyond.
• We showed that the build-time and space complexity of any SLI developed
with FLuID scales linear with respect to the number of triples indexed.
• Outlook
• Implementing FLuID in a single computation- and query-framework
• https://github.com/t-blume/fluid-framework
• http://lodatio.informatik.uni-kiel.de/
• Qualitatively comparing existing and new approaches.
Conclusion & Outlook
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
17 of 17
Thank you for your attention!
Any questions?
Project consortium and funding agency
MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092
www.moving-project.eu
18 of 17
References
1. F. Benedetti, S. Bergamaschi, and L. Po. Exposing the underlying schema of LOD sources. In Joint IEEE/WIC/ACM WI and
IAT, 2015.
2. M. Ciglan, K. Nørv˚ag, and L. Hluch´y. The SemSets model for ad-hoc semantic list search. In WWW, 2012.
3. L. Galke, F. Mai, A. Schelten, D. Brunsch, A. Scherp: Using titles vs. full-text as source for automated semantic document
annotation. In: K-CAP 2017
4. L. Galke, A. Saleh, A. Scherp: Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information
Retrieval. In: INFORMATIK 2017
5. R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In
VLDB 1997.
6. J. Jett, T. Nurmikko-Fuller, T.W. Cole, K.R. Page, J.S. Downie: Enhancing scholarly use of digital libraries: A comparative
survey and review of bibliographic metadata ontologies. In: JCDL 2016
7. M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - efficient construction of a data catalogue by stream-based
indexing of Linked Data. J. Web Sem., 16:52–58, 2012.
8. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: a database management system for semistructured
data. SIGMOD Record, 26(3):54–66, 1997.
9. T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In
ICDE, 2011.
10. J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the
Linked Open Data cloud. In ESWC, 2016.
11. B. Spahiu, R. Porrini, M. Palmonari, A. Rula, and A. Maurino. ABSTAT: ontology-driven Linked Data summaries with pattern
minimalization. In ESWC Satellite Events, Revised Selected Papers, 2016.
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
www.moving-project.eu
19 of 17
Search Engine Prototype: LODatio+
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
http://lodatio.informatik.uni-kiel.de
www.moving-project.eu
20 of 17
Real World Application Scenario
Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
http://platform.moving-project.eu

More Related Content

PPTX
Sasaki datathon-madrid-2015
ODP
Querying GrAF data in linguistic analysis
PDF
Regal - a Repository for Electronic Documents and Bibliographic Data
PPTX
Exploring legacy ware with rdf and survol.17 july 2018
PDF
LDP-DL: A language to define the design of Linked Data Platforms
PDF
Semantic Web talk TEMPLATE
PDF
Evolution of the Graph Schema
PPTX
Hacktoberfest 2020 - Intro to Knowledge Graphs
Sasaki datathon-madrid-2015
Querying GrAF data in linguistic analysis
Regal - a Repository for Electronic Documents and Bibliographic Data
Exploring legacy ware with rdf and survol.17 july 2018
LDP-DL: A language to define the design of Linked Data Platforms
Semantic Web talk TEMPLATE
Evolution of the Graph Schema
Hacktoberfest 2020 - Intro to Knowledge Graphs

What's hot (20)

PDF
What_do_Knowledge_Graph_Embeddings_Learn.pdf
PPTX
LD4KD 2015 - Demos and tools
PPTX
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
PDF
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
PPT
SPARQL and SQL: technical aspects and synergy
PPTX
A Deep Dive Implementing xAPI in Learning Games
PDF
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
PDF
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
PDF
New Adventures in RDF2vec
PDF
Publishing metadata provenance
PDF
Download Python for R Users pdf for free
PPTX
Flink Case Study: OKKAM
PDF
The LINQ Between XML and Database
PDF
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
PDF
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
PDF
ROI in Linking Content to CRM by Applying the Linked Data Stack
PDF
Geant4 Model Testing Framework: From PAW to ROOT
PDF
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
PDF
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
PDF
Artificial Intelligence Layer: Mahout, MLLib, and other projects
What_do_Knowledge_Graph_Embeddings_Learn.pdf
LD4KD 2015 - Demos and tools
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
SPARQL and SQL: technical aspects and synergy
A Deep Dive Implementing xAPI in Learning Games
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
New Adventures in RDF2vec
Publishing metadata provenance
Download Python for R Users pdf for free
Flink Case Study: OKKAM
The LINQ Between XML and Database
Geospatial querying in Apache Marmotta - ApacheCon Big Data Europe 2015
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
ROI in Linking Content to CRM by Applying the Linked Data Stack
Geant4 Model Testing Framework: From PAW to ROOT
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
Artificial Intelligence Layer: Mahout, MLLib, and other projects
Ad

Similar to Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID (20)

PPTX
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
PPTX
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
PDF
Indexing data on the web a comparison of schema level indices for data search
PDF
PDF
Relaxing global-as-view in mediated data integration from linked data
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
PDF
Portable Lucene Index Format & Applications - Andrzej Bialecki
PDF
International Conference on Knowledge Discovery and Information Retrieval 2009
PDF
Opportunistic Linked Data Querying through Approximate Membership Metadata
PDF
IRJET- Big Data Processes and Analysis using Hadoop Framework
PDF
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
PDF
Introduction to the FP7 CODE project @ BDBC
PPTX
Semantic Web and Related Work at W3C
PDF
A Schema-Based Approach To Modeling And Querying WWW Data
PDF
Indexing techniques for advanced database systems
PPTX
Knowledge Graph Introduction
PDF
Employing Graph Databases as a Standardization Model towards Addressing Heter...
PDF
Virtual Knowledge Graph by MIT Article.pdf
PPTX
PDF
Trends on Adaptive Object Model Research
The FLuID Meta Model: Incrementally Compute Schema-level Indices for the Web...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Indexing data on the web a comparison of schema level indices for data search
Relaxing global-as-view in mediated data integration from linked data
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Portable Lucene Index Format & Applications - Andrzej Bialecki
International Conference on Knowledge Discovery and Information Retrieval 2009
Opportunistic Linked Data Querying through Approximate Membership Metadata
IRJET- Big Data Processes and Analysis using Hadoop Framework
ANALYTIC QUERIES OVER GEOSPATIAL TIME-SERIES DATA USING DISTRIBUTED HASH TABLES
Introduction to the FP7 CODE project @ BDBC
Semantic Web and Related Work at W3C
A Schema-Based Approach To Modeling And Querying WWW Data
Indexing techniques for advanced database systems
Knowledge Graph Introduction
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Virtual Knowledge Graph by MIT Article.pdf
Trends on Adaptive Object Model Research
Ad

Recently uploaded (20)

PPT
The Effect of Human Resource Management Practice on Organizational Performanc...
DOCX
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
PDF
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
PPTX
fundraisepro pitch deck elegant and modern
PDF
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
PPTX
Anesthesia and it's stage with mnemonic and images
PPTX
lesson6-211001025531lesson plan ppt.pptx
PPTX
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
PPTX
An Unlikely Response 08 10 2025.pptx
PPTX
2025-08-10 Joseph 02 (shared slides).pptx
PPTX
nose tajweed for the arabic alphabets for the responsive
PPTX
Human Mind & its character Characteristics
PPTX
water for all cao bang - a charity project
PPTX
Primary and secondary sources, and history
PPTX
Hydrogel Based delivery Cancer Treatment
PPTX
worship songs, in any order, compilation
PPTX
Tour Presentation Educational Activity.pptx
PDF
Swiggy’s Playbook: UX, Logistics & Monetization
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PPTX
Introduction-to-Food-Packaging-and-packaging -materials.pptx
The Effect of Human Resource Management Practice on Organizational Performanc...
"Project Management: Ultimate Guide to Tools, Techniques, and Strategies (2025)"
Tunisia's Founding Father(s) Pitch-Deck 2022.pdf
fundraisepro pitch deck elegant and modern
Nykaa-Strategy-Case-Fixing-Retention-UX-and-D2C-Engagement (1).pdf
Anesthesia and it's stage with mnemonic and images
lesson6-211001025531lesson plan ppt.pptx
INTERNATIONAL LABOUR ORAGNISATION PPT ON SOCIAL SCIENCE
An Unlikely Response 08 10 2025.pptx
2025-08-10 Joseph 02 (shared slides).pptx
nose tajweed for the arabic alphabets for the responsive
Human Mind & its character Characteristics
water for all cao bang - a charity project
Primary and secondary sources, and history
Hydrogel Based delivery Cancer Treatment
worship songs, in any order, compilation
Tour Presentation Educational Activity.pptx
Swiggy’s Playbook: UX, Logistics & Monetization
Tablets And Capsule Preformulation Of Paracetamol
Introduction-to-Food-Packaging-and-packaging -materials.pptx

Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID

  • 1. www.moving-project.eu TraininG towards a society of data-saVvy inforMation prOfessionals to enable open leadership INnovation Till Blume and Ansgar Scherp ZBW – Leibniz Information Centre for Economics Christian-Albrechts-Universitat zu Kiel Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID May 23rd, 2018, 30th GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken), 22.05.2018 - 25.05.2018, Wuppertal, Germany.
  • 2. www.moving-project.eu 2 of 17 Why use a Schema-level Index? Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID Index 1 foaf:Agent dct:subject bibo:Book dct:creator ?! I want more metadata! Where to get it from? … 2 Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 Problem: • We are looking for a specific kind of metadata, e.g., about books. • We do not know in which databases we can find such metadata. • We need an index that can be queried to find matching databases. Solution: • A schema-level index (SLI) summarizes data by storing information of how the data is modelled in a specific database. • We formulate a structural query to find matching databases.
  • 3. www.moving-project.eu 3 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID … Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 MOVING platform Index 1 foaf:Agent dct:subject bibo:Book dct:creator 2 MOVING search scenario: • The MOVING platform1 provides a search for bibliographic resources • We harvest bibliographic metadata using different SLIs • Such metadata is of great value since • We can obtain good search results solely relying on the title [3]. • We can complement existing metadata. • We can train machine learning models to further improve the search [4]. 1http://platform.moving-project.eu 3
  • 4. www.moving-project.eu 4 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID … Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 MOVING platform Index 1 foaf:Agent dct:subject bibo:Book dct:creator 2 MOVING search scenario: • Which SLIs are best suited to find bibliographic metadata in the Web of Data? • Can we find semantically similar databases as well? Proceedings of the … Benjamin Elizalde foaf:Agent URI-9 URI-8 bibo:Proceedings dct:subject URI-6 3
  • 5. www.moving-project.eu 5 of 17 • All schema-level indices (SLI) summarize data differently, for different purposes, and lack a common formalization [1,2,5,7-11], for example: • Consider incoming and outgoing properties (edges) • Consider properties (edge label) and objects (target node) • Consider types • Consider types and properties • … • Without a common ground, it is difficult to develop new indices and compare them to existing ones. • Even for a single application scenario, a single SLI may not be sufficient since how the data is modelled can vary a lot [6]. Motivation for FLuID Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 6. www.moving-project.eu 6 of 17 Approach • Abstract from the Related Work (Bottom-up): Find generic, simple patterns in existing SLIs and use them as basic building blocks to define all (complex) schema structures that exist in previous SLIs. • MOVING search scenario (Top-down): Flexible define indices that can reflect semantic information and can be efficiently computed. Solution 1. We formalized our building blocks using equivalence relations over directed edge labeled multigraph (RDF graph). 2. We demonstrated how to model existing works and beyond. 3. We showed the scalability by conducting a complexity analysis. The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 7. www.moving-project.eu 7 of 17 • FLuID provides 7 schema elements: • 3 simple elements: Object Cluster (OC), Property Cluster (PC), and Property- Object Cluster (POC) • 3 undirected elements: u-OC, u-PC, and u-POC • 1 Complex Schema Element (CSE) • FLuID provides 4 parameterizations: • Label parameterization • Chaining parameterization • Ontology paramaterization • Instance parameterization • In total, FLuID provides 11 building blocks sufficient to model all existing approaches and beyond. The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 8. www.moving-project.eu 8 of 17 • Instances: edges <s,p,o> with same subject node s, i.e., ((i1, p1, o1), (i2, p2, o2)) ∈ I ⇔ i1 = i2. • Edges belong to exactly 1 instance, nodes not necessarily • Since instances partition the data graph, a set of instances also partitions the data graph. FLuID: Equivalence Relation Approach Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 9. www.moving-project.eu 9 of 17 • Object Cluster: summarize instances that share a set of connected objects, i.e., ([i1]I , [i2]I ) ∈ OC ⇔ ∀(i1, p1, o1)∃(i2, p2, o2) : o1 = o2 ∧ ∀(i2, p2, o2) ∃(i1, p1, o1) : o1 = o2 The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 10. www.moving-project.eu 10 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is p1 The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 p1 p2 p1 p3 p2 p1
  • 11. www.moving-project.eu 11 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 rdf:type p2 rdf:type p3 p2 rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 12. www.moving-project.eu 12 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type • Ontology paramaterization: RDFS Schema Graph The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 p2 rdf:type p2 rdf:type p3 p2 rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 13. www.moving-project.eu 13 of 17 • Label Parameterized Object Cluster: summarize instances that have the set of connected objects, if the property is rdf:type • Ontology paramaterization: RDFS Schema Graph • Instance parameterization: owl:sameAs The FLuID Model Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 dct:creator rdf:type dct:creator rdf:type owl:sameAs dct:creator rdf:type Bbibo:Book Bfoaf:Agent Bbibo:Proceedings
  • 14. www.moving-project.eu 14 of 17 A Semantic Schema-level Index Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID Index 1foaf:Agent dct:subject bibo:Book dct:creator … 2 Proceedings of the … Benjamin Elizalde foaf:Agent URI-9 URI-8 bibo:Proceedings dct:subject URI-6 Towards a clean air policy Great Britain. Central Electricity foaf:Agent URI-1 URI-2 bibo:Book dct:subject URI-3 Family planning programmes in Africa dct:creator Pierre Prader URI-0 bibo:Book dct:subject URI-3 URI-4 URI-5 owl:sameAs Pierre Prader URI-5 foaf:Agent
  • 15. www.moving-project.eu 15 of 17 • Complexity Analysis • We show that every SLI modeled with FLuID can be computed in O(n). • Threat: The on-the-fly inferencing! If there was a linear dependency of RDFS triples and dataset size, we would have quadratic complexity. • Empirical Evaluation to estimate impact of inferencing • We analyzed two real-world datasets from the Web of Data. • TimBL-11M: 11 million triples (edges) crawled from one seed URI. • DyLDO-127M: 127 million triples (edges) crawled from 95,000 seed URIs. • Practical impact of the on-the-fly inferencing: g < 1.001. • Thus, we did not find a linear dependency but rather a constant factor. Evaluation Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 16. www.moving-project.eu 16 of 17 • Conclusion • We have presented the novel, parameterized schema-level index model FLuID, which is sufficient to express the functionalities of existing SLIs and beyond. • We showed that the build-time and space complexity of any SLI developed with FLuID scales linear with respect to the number of triples indexed. • Outlook • Implementing FLuID in a single computation- and query-framework • https://github.com/t-blume/fluid-framework • http://lodatio.informatik.uni-kiel.de/ • Qualitatively comparing existing and new approaches. Conclusion & Outlook Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 17. www.moving-project.eu 17 of 17 Thank you for your attention! Any questions? Project consortium and funding agency MOVING is funded by the EU Horizon 2020 Programme under the project number INSO-4-2015: 693092
  • 18. www.moving-project.eu 18 of 17 References 1. F. Benedetti, S. Bergamaschi, and L. Po. Exposing the underlying schema of LOD sources. In Joint IEEE/WIC/ACM WI and IAT, 2015. 2. M. Ciglan, K. Nørv˚ag, and L. Hluch´y. The SemSets model for ad-hoc semantic list search. In WWW, 2012. 3. L. Galke, F. Mai, A. Schelten, D. Brunsch, A. Scherp: Using titles vs. full-text as source for automated semantic document annotation. In: K-CAP 2017 4. L. Galke, A. Saleh, A. Scherp: Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical Information Retrieval. In: INFORMATIK 2017 5. R. Goldman and J. Widom. DataGuides: Enabling query formulation and optimization in semistructured databases. In VLDB 1997. 6. J. Jett, T. Nurmikko-Fuller, T.W. Cole, K.R. Page, J.S. Downie: Enhancing scholarly use of digital libraries: A comparative survey and review of bibliographic metadata ontologies. In: JCDL 2016 7. M. Konrath, T. Gottron, S. Staab, and A. Scherp. SchemEX - efficient construction of a data catalogue by stream-based indexing of Linked Data. J. Web Sem., 16:52–58, 2012. 8. J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom. Lore: a database management system for semistructured data. SIGMOD Record, 26(3):54–66, 1997. 9. T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In ICDE, 2011. 10. J. Schaible, T. Gottron, and A. Scherp. TermPicker: Enabling the reuse of vocabulary terms by exploiting data from the Linked Open Data cloud. In ESWC, 2016. 11. B. Spahiu, R. Porrini, M. Palmonari, A. Rula, and A. Maurino. ABSTAT: ontology-driven Linked Data summaries with pattern minimalization. In ESWC Satellite Events, Revised Selected Papers, 2016. Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID
  • 19. www.moving-project.eu 19 of 17 Search Engine Prototype: LODatio+ Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID http://lodatio.informatik.uni-kiel.de
  • 20. www.moving-project.eu 20 of 17 Real World Application Scenario Towards Flexible Indices for Distributed Graph Data: The Formal Schema-level Index Model FLuID http://platform.moving-project.eu

Editor's Notes

  • #3: Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  • #4: Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  • #5: Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!
  • #9: Colors indicate partitions on the data graph
  • #10: P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  • #11: P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  • #12: P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  • #13: P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  • #14: P1 = rdf:type P2 = dct:creator P3 = owl:sameAs P4 = rdfs:subClassOf
  • #16: Build time is important for the computation of the index Index size influences the query time
  • #21: Structural query = query without instance information meaning the title or the author Index is computed from the data: for example we crawl open databases in the web!