SlideShare a Scribd company logo
1 Knowledge Representation & Reasoning, Computer Science Department
MODELLING AND QUERYING LISTS
IN RDF: A PRAGMATIC STUDY
Enrico Daga, The Open University
Albert Meroño-Peñuela, Vrije Universiteit Amsterdam
@albertmeronyo
Enrico Motta, The Open University
QuWeDa 2019: 3rd Workshop on Querying and Benchmarking the
Web of Data
ISWC, 26 October, Auckland
2
 LOD publishing should make data easy to consume
 Modelling choices are often left to subjective choice
 These practices and their reuse are key in query performance
 Lists are everywhere! Co-authors, timelines, media, recipes, etc.
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
3
 And in MIDI
Knowledge Representation & Reasoning, Computer Science Department
MOTIVATION
 So what do we know about performance of RDF List solutions?
…
[ 144, 60, 100]
[ 128, 60, 64 ]
…
[Pic of music editing software]
4
 Modelling of RDF lists
> RDF(S) container classes (rdf:Bag, rdf:Alt, rdf:Seq)
> Closed collections (rdf:List : rdf:first, rdf:rest, rdf:nil)
> JSON-LD/Turtle syntaxes: "@list": [ "joe", "bob", "jaybee" ],
:a :b ( "bob" "alice" "carol")
> Ontology Design Patterns: Sequence OP, Collections Ontology
 Benchmark datasets and queries
> BSBM, LUBM, SP2Bench, DBPedia SPARQL, WatDiv
> LSQ
> IGUANA, LDBC
Knowledge Representation & Reasoning, Computer Science Department
RELATED WORK
5
What RDF list models are common in LOD? What is their impact in
performance when retrieving them? Can we identify patterns enabling
sustainability?
C1: Survey of common list modelling practices in RDF
C2: Their comparison when queried from common triplestores in
various sizes and operations
Knowledge Representation & Reasoning, Computer Science Department
RESEARCH QUESTIONS & CONTRIBUTIONS
6
CQ1. Full list lookup: What is the ordered content of the list?
CQ2. N-th Lookup: Which is the n-th item in the list?
CQ3. Ordered Range: What are the n…m items in the list?
Aimed at supporting use-case LOD publishing
Do not deal with list management (edit, merge, split, etc.)
Focus on minimal and atomic operations related to list ordered access
Knowledge Representation & Reasoning, Computer Science Department
REQUIREMENTS (OPERATIONS)
7
Surveyed from:
 W3C standards
 The Ontology Design Patterns portal
 List choices in RDF datasets from ISWC resource track papers
 Linked Open Vocabularies (LOV)
 LOD Laundromat/LOD-a-lot file
Findings: RDF Sequences, RDF Lists, URI-based Lists, Number-based
Lists, Timestamp-based Lists, Sequence Ontology Pattern
Knowledge Representation & Reasoning, Computer Science Department
LIST PATTERNS
8 Knowledge Representation & Reasoning, Computer Science Department
RDF SEQUENCE AND RDF LIST
[SEQ]
[LIST]
9 Knowledge Representation & Reasoning, Computer Science Department
URI, NUMBER, TIMESTAMP IMPLICIT ORDERING
[URI]
[NUM] [TIME]
10 Knowledge Representation & Reasoning, Computer Science Department
SEQUENCE ONTOLOGY PATTERN
[SOP]
11
[SEQ] WHERE {:list a midi:Track ; midi:hasEvents [ ?seq ?event ] .
BIND (xsd:integer(SUBSTR(str(?seq), 45)) AS ?index)
} ORDER BY ?index  OFFSET <N> LIMIT <M-N+1>
[LIST] SELECT ?event (COUNT(?step) as ?index) WHERE {
:list a midi:Track ; midi:hasEvents ?events . ?events rdf:rest∗ ?step .
?step rdf:rest∗ ?elt . ?elt rdf:first ?event .
} GROUP BY ?event ORDER BY ?index  rdf:rest{N}, /…{N}…/
[URI] WHERE { [] a midi:Track ; midi:hasEvent ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id) } ORDER BY ?id  OFFSET…
[NUM/TIME] WHERE { [] a midi:Track ; midi:hasEvent ?event .
?event midi:absoluteTick ?tick . } ORDER BY ?tick
[SOP] WHERE { [] a midi:Track ; midi:hasEvent ?event . ?event sequence:precedes?
?next_event . ?next_event sequence:follows? ?event .
BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id)
} ORDER BY ?id
Knowledge Representation & Reasoning, Computer Science Department
FORMALIZATION
12
 Dataset: MIDI Linked Data Cloud, 300K MIDIs in RDF [Meroño-Peñuela
et al. ISWC 2017]
 Benchmark: List.MID (come see ISWC resource paper!)
 Size dimension: lists of 1k, 30k, 60k, 90k, 120k elements
 Pattern dimension: list patterns
 Operations: SPARQL for all list, n-th element, n-m range
 Triplestores: Virtuoso V7, Blazegraph 2.1.5, Fuseki v3 TDB, Fuseki
v3 Memory
Knowledge Representation & Reasoning, Computer Science Department
EVALUATION
13 Knowledge Representation & Reasoning, Computer Science Department
RESULTS CQ1 (FULL LIST)
14 Knowledge Representation & Reasoning, Computer Science Department
RESULTS CQ2 (N-TH ELEMENT)
15 Knowledge Representation & Reasoning, Computer Science Department
RESULTS CQ3 (N…M RANGE)
16
 Coherent behavior among triplestores (model > optimization)
 rdf:List elegant but poor performance (Fuseki timeout)
 SOP scales better than rdf:List yet less efficient than property-based
lists
 rdf:Seq and property-based [NUM], [TIME], [URI] perform best
> Hypothesize mostly due to P and S-O database indexes, resp.
 Virtuoso’s management of OFFSET, LIMIT on [NUM], [TIME]
 rdf:Seq is a good trade-off but strictly for open lists
> Indices rdf:_N do not guarantee random access
> Update in SPARQL 1.2 spec?Knowledge Representation & Reasoning, Computer Science Department
OBSERVATIONS
17
Lists are important! But how to assess the impact of their models?
 6 common list patterns in RDF and their performance comparison
 2 model families: link-based lists, property-based lists
 For our CQs, inelegant literals > Link-based lists
Limitations/future work:
 Limited set of list operations (e.g. rdf:List could win in e.g. addition)
 No triplestore optimization
 Apply methodology to other data structures
Knowledge Representation & Reasoning, Computer Science Department
CONCLUSIONS
18
Questions, comments, suggestions
most welcome
@enridaga
@albertmeronyo
https://github.com/MIDI-LD/List.MID
Knowledge Representation & Reasoning, Computer Science
Department
THANK YOU
19
 Motivation
 Related Work
 Requirements
 List Patterns
 Queries
 Performance experiments
 Conclusions
Knowledge Representation & Reasoning, Computer Science Department
OUTLINE
20 Knowledge Representation & Reasoning, Computer Science Department
21
Use this slide to place an image to
the left and text to the right.
 With bullet
> Secundary list
To replace the image, right-click the
image (click on it with your other
mouse button), select Change
picture… and choose the new
image.
Knowledge Representation & Reasoning, Computer Science
Department
22 Knowledge Representation & Reasoning, Computer Science Department
23 Knowledge Representation & Reasoning, Computer Science Department
24
25

More Related Content

PPTX
List.MID: A MIDI-Based Benchmark for RDF Lists
PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
PPTX
Deriving an Emergent Relational Schema from RDF Data
PDF
Scaling the (evolving) web data –at low cost-
PDF
Heuristic based Query Optimisation for SPARQL
PPT
Achieving time effective federated information from scalable rdf data using s...
PPT
Scalable Data Analysis in R -- Lee Edlefsen
PDF
Verifying Integrity Constraints of a RDF-based WordNet
List.MID: A MIDI-Based Benchmark for RDF Lists
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Deriving an Emergent Relational Schema from RDF Data
Scaling the (evolving) web data –at low cost-
Heuristic based Query Optimisation for SPARQL
Achieving time effective federated information from scalable rdf data using s...
Scalable Data Analysis in R -- Lee Edlefsen
Verifying Integrity Constraints of a RDF-based WordNet

What's hot (16)

PPTX
R and Rcmdr Statistical Software
PPTX
FedX - Optimization Techniques for Federated Query Processing on Linked Data
PPTX
RDF data model
PDF
Another RDF Encoding Form
PPTX
Federated SPARQL query processing over the Web of Data
PPTX
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
PPTX
Introduction to RDF Data Model
PPTX
Federated SPARQL Query Processing ISWC2015 Tutorial
PDF
FAIRness through a novel combination of Web technologies
PPTX
Triple Stores
PPTX
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
PPT
Rdf Overview Presentation
PDF
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
PPTX
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
R and Rcmdr Statistical Software
FedX - Optimization Techniques for Federated Query Processing on Linked Data
RDF data model
Another RDF Encoding Form
Federated SPARQL query processing over the Web of Data
SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
Introduction to RDF Data Model
Federated SPARQL Query Processing ISWC2015 Tutorial
FAIRness through a novel combination of Web technologies
Triple Stores
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Rdf Overview Presentation
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
Pattern-based Acquisition of Scientific Entities from Scholarly Article Title...
Ad

Similar to Modelling and Querying Lists in RDF. A Pragmatic Study (20)

PDF
SemFacet paper
PDF
Sem facet paper
PPT
Stream Reasoning: Where we got so far. Oxford 2010.1.18
PDF
Towards efficient processing of RDF data streams
PDF
Towards efficient processing of RDF data streams
PPTX
Optimized index structures for querying rdf from the web
PDF
Linked Open Data Visualization
PPTX
RDA from Scratch for Catalogers
PPT
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
PPTX
Democratizing Big Semantic Data management
PPTX
Ontology mapping for the semantic web
PDF
Deriving human readable labels from sparql queries
PPTX
Weso research group
PDF
Translation of Relational and Non-Relational Databases into RDF with xR2RML
PPTX
Transient and persistent RDF views over relational databases in the context o...
PDF
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
ODP
2009 0807 Lod Gmod
ODP
State of the Semantic Web
PDF
B.Tech 2nd Year CSE & CSIT AICTE Model Curriculum 2019-20.pdf
PDF
SE-IT DSA THEORY SYLLABUS
SemFacet paper
Sem facet paper
Stream Reasoning: Where we got so far. Oxford 2010.1.18
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
Optimized index structures for querying rdf from the web
Linked Open Data Visualization
RDA from Scratch for Catalogers
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
Democratizing Big Semantic Data management
Ontology mapping for the semantic web
Deriving human readable labels from sparql queries
Weso research group
Translation of Relational and Non-Relational Databases into RDF with xR2RML
Transient and persistent RDF views over relational databases in the context o...
Stanford'12 Intro to Ontology Based Data Access for RDBMS through query rewri...
2009 0807 Lod Gmod
State of the Semantic Web
B.Tech 2nd Year CSE & CSIT AICTE Model Curriculum 2019-20.pdf
SE-IT DSA THEORY SYLLABUS
Ad

More from Albert Meroño-Peñuela (19)

PPTX
Making social science more reproducible by encapsulating access to linked data
PPTX
What can I expect from an academic career? Valuable skills
PPTX
The MIDI Linked Data Cloud
PPTX
Automatic Query-Centric API for Routine Access to Linked Data
PPTX
One Score To Rule Them All: Semantics in Music Notation
PPTX
Repeatable Semantic Queries for the Linked Data Agnostic
PPTX
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
PPTX
grlc: Bridging the Gap Between RESTful APIs and Linked Data
PPTX
grlc Makes GitHub Taste Like Linked Data APIs
PPTX
Historical Reasoning on the Web
PPTX
How does a knowledge graph sound like? (or: music is a graph)
PPTX
What Is Linked Historical Data?
PPTX
CBS CEDAR Presentation
PPTX
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
PDF
Non-Temporal Orderings for Extensional Concept Drift
PDF
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
PDF
Semantic Web for the Humanities
PPT
Linked Census Data
PPTX
Linked Humanities data
Making social science more reproducible by encapsulating access to linked data
What can I expect from an academic career? Valuable skills
The MIDI Linked Data Cloud
Automatic Query-Centric API for Routine Access to Linked Data
One Score To Rule Them All: Semantics in Music Notation
Repeatable Semantic Queries for the Linked Data Agnostic
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc Makes GitHub Taste Like Linked Data APIs
Historical Reasoning on the Web
How does a knowledge graph sound like? (or: music is a graph)
What Is Linked Historical Data?
CBS CEDAR Presentation
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
Non-Temporal Orderings for Extensional Concept Drift
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Semantic Web for the Humanities
Linked Census Data
Linked Humanities data

Recently uploaded (20)

PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Artificial Intelligence
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
web development for engineering and engineering
PPT
Mechanical Engineering MATERIALS Selection
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Well-logging-methods_new................
PDF
composite construction of structures.pdf
PPT
Project quality management in manufacturing
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
737-MAX_SRG.pdf student reference guides
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PPT on Performance Review to get promotions
R24 SURVEYING LAB MANUAL for civil enggi
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Artificial Intelligence
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
web development for engineering and engineering
Mechanical Engineering MATERIALS Selection
Embodied AI: Ushering in the Next Era of Intelligent Systems
CH1 Production IntroductoryConcepts.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Well-logging-methods_new................
composite construction of structures.pdf
Project quality management in manufacturing
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
III.4.1.2_The_Space_Environment.p pdffdf
737-MAX_SRG.pdf student reference guides
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT on Performance Review to get promotions

Modelling and Querying Lists in RDF. A Pragmatic Study

  • 1. 1 Knowledge Representation & Reasoning, Computer Science Department MODELLING AND QUERYING LISTS IN RDF: A PRAGMATIC STUDY Enrico Daga, The Open University Albert Meroño-Peñuela, Vrije Universiteit Amsterdam @albertmeronyo Enrico Motta, The Open University QuWeDa 2019: 3rd Workshop on Querying and Benchmarking the Web of Data ISWC, 26 October, Auckland
  • 2. 2  LOD publishing should make data easy to consume  Modelling choices are often left to subjective choice  These practices and their reuse are key in query performance  Lists are everywhere! Co-authors, timelines, media, recipes, etc. Knowledge Representation & Reasoning, Computer Science Department MOTIVATION
  • 3. 3  And in MIDI Knowledge Representation & Reasoning, Computer Science Department MOTIVATION  So what do we know about performance of RDF List solutions? … [ 144, 60, 100] [ 128, 60, 64 ] … [Pic of music editing software]
  • 4. 4  Modelling of RDF lists > RDF(S) container classes (rdf:Bag, rdf:Alt, rdf:Seq) > Closed collections (rdf:List : rdf:first, rdf:rest, rdf:nil) > JSON-LD/Turtle syntaxes: "@list": [ "joe", "bob", "jaybee" ], :a :b ( "bob" "alice" "carol") > Ontology Design Patterns: Sequence OP, Collections Ontology  Benchmark datasets and queries > BSBM, LUBM, SP2Bench, DBPedia SPARQL, WatDiv > LSQ > IGUANA, LDBC Knowledge Representation & Reasoning, Computer Science Department RELATED WORK
  • 5. 5 What RDF list models are common in LOD? What is their impact in performance when retrieving them? Can we identify patterns enabling sustainability? C1: Survey of common list modelling practices in RDF C2: Their comparison when queried from common triplestores in various sizes and operations Knowledge Representation & Reasoning, Computer Science Department RESEARCH QUESTIONS & CONTRIBUTIONS
  • 6. 6 CQ1. Full list lookup: What is the ordered content of the list? CQ2. N-th Lookup: Which is the n-th item in the list? CQ3. Ordered Range: What are the n…m items in the list? Aimed at supporting use-case LOD publishing Do not deal with list management (edit, merge, split, etc.) Focus on minimal and atomic operations related to list ordered access Knowledge Representation & Reasoning, Computer Science Department REQUIREMENTS (OPERATIONS)
  • 7. 7 Surveyed from:  W3C standards  The Ontology Design Patterns portal  List choices in RDF datasets from ISWC resource track papers  Linked Open Vocabularies (LOV)  LOD Laundromat/LOD-a-lot file Findings: RDF Sequences, RDF Lists, URI-based Lists, Number-based Lists, Timestamp-based Lists, Sequence Ontology Pattern Knowledge Representation & Reasoning, Computer Science Department LIST PATTERNS
  • 8. 8 Knowledge Representation & Reasoning, Computer Science Department RDF SEQUENCE AND RDF LIST [SEQ] [LIST]
  • 9. 9 Knowledge Representation & Reasoning, Computer Science Department URI, NUMBER, TIMESTAMP IMPLICIT ORDERING [URI] [NUM] [TIME]
  • 10. 10 Knowledge Representation & Reasoning, Computer Science Department SEQUENCE ONTOLOGY PATTERN [SOP]
  • 11. 11 [SEQ] WHERE {:list a midi:Track ; midi:hasEvents [ ?seq ?event ] . BIND (xsd:integer(SUBSTR(str(?seq), 45)) AS ?index) } ORDER BY ?index  OFFSET <N> LIMIT <M-N+1> [LIST] SELECT ?event (COUNT(?step) as ?index) WHERE { :list a midi:Track ; midi:hasEvents ?events . ?events rdf:rest∗ ?step . ?step rdf:rest∗ ?elt . ?elt rdf:first ?event . } GROUP BY ?event ORDER BY ?index  rdf:rest{N}, /…{N}…/ [URI] WHERE { [] a midi:Track ; midi:hasEvent ?event . BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id) } ORDER BY ?id  OFFSET… [NUM/TIME] WHERE { [] a midi:Track ; midi:hasEvent ?event . ?event midi:absoluteTick ?tick . } ORDER BY ?tick [SOP] WHERE { [] a midi:Track ; midi:hasEvent ?event . ?event sequence:precedes? ?next_event . ?next_event sequence:follows? ?event . BIND (xsd:integer(SUBSTR(str(?event), 77)) AS ?id) } ORDER BY ?id Knowledge Representation & Reasoning, Computer Science Department FORMALIZATION
  • 12. 12  Dataset: MIDI Linked Data Cloud, 300K MIDIs in RDF [Meroño-Peñuela et al. ISWC 2017]  Benchmark: List.MID (come see ISWC resource paper!)  Size dimension: lists of 1k, 30k, 60k, 90k, 120k elements  Pattern dimension: list patterns  Operations: SPARQL for all list, n-th element, n-m range  Triplestores: Virtuoso V7, Blazegraph 2.1.5, Fuseki v3 TDB, Fuseki v3 Memory Knowledge Representation & Reasoning, Computer Science Department EVALUATION
  • 13. 13 Knowledge Representation & Reasoning, Computer Science Department RESULTS CQ1 (FULL LIST)
  • 14. 14 Knowledge Representation & Reasoning, Computer Science Department RESULTS CQ2 (N-TH ELEMENT)
  • 15. 15 Knowledge Representation & Reasoning, Computer Science Department RESULTS CQ3 (N…M RANGE)
  • 16. 16  Coherent behavior among triplestores (model > optimization)  rdf:List elegant but poor performance (Fuseki timeout)  SOP scales better than rdf:List yet less efficient than property-based lists  rdf:Seq and property-based [NUM], [TIME], [URI] perform best > Hypothesize mostly due to P and S-O database indexes, resp.  Virtuoso’s management of OFFSET, LIMIT on [NUM], [TIME]  rdf:Seq is a good trade-off but strictly for open lists > Indices rdf:_N do not guarantee random access > Update in SPARQL 1.2 spec?Knowledge Representation & Reasoning, Computer Science Department OBSERVATIONS
  • 17. 17 Lists are important! But how to assess the impact of their models?  6 common list patterns in RDF and their performance comparison  2 model families: link-based lists, property-based lists  For our CQs, inelegant literals > Link-based lists Limitations/future work:  Limited set of list operations (e.g. rdf:List could win in e.g. addition)  No triplestore optimization  Apply methodology to other data structures Knowledge Representation & Reasoning, Computer Science Department CONCLUSIONS
  • 18. 18 Questions, comments, suggestions most welcome @enridaga @albertmeronyo https://github.com/MIDI-LD/List.MID Knowledge Representation & Reasoning, Computer Science Department THANK YOU
  • 19. 19  Motivation  Related Work  Requirements  List Patterns  Queries  Performance experiments  Conclusions Knowledge Representation & Reasoning, Computer Science Department OUTLINE
  • 20. 20 Knowledge Representation & Reasoning, Computer Science Department
  • 21. 21 Use this slide to place an image to the left and text to the right.  With bullet > Secundary list To replace the image, right-click the image (click on it with your other mouse button), select Change picture… and choose the new image. Knowledge Representation & Reasoning, Computer Science Department
  • 22. 22 Knowledge Representation & Reasoning, Computer Science Department
  • 23. 23 Knowledge Representation & Reasoning, Computer Science Department
  • 24. 24
  • 25. 25