Deriving an Emergent Relational Schema
from RDF Data
Minh-Duc Pham Linnea Passing Orri Erling Peter Boncz
duc@cwi.nl linnea.passing@tum.de oerling@openlinksw.com boncz@cwi.nl
 Bad query plans
 Low storage locality
 Lack of user schema insight
Main Problems in RDF Data Management [1]
[1] Minh-Duc Pham, Boncz P.A., " Self-organizing Structured RDF in MonetDB ," PhD Symposium, ICDE, 2013
RDF de-emphasizes the need for a schema
and the notion of structure in the data
Emergent Schema
Emergent schema = “rough” schema to which the majority of triples conforms
Recognize:
 Classes (Characteristic Sets - CS’s ) – recognize “classes” of often co-occurring properties
 Relationships (CS) – recognize often-occurring references between such classes
+ give logical names to these
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://rdfs.org/sioc/ns#num_replies>
<http://purl.org/dc/terms/title>
<http://rdfs.org/sioc/ns#has_creator>
<http://purl.org/dc/terms/date>
<http://purl.org/dc/terms/created>
<http://purl.org/rss/1.0/modules/content/encoded>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/name>
<http://xmlns.com/foaf/0.1/page>
<has_creator>
“Book”
Recovering the Emergent Schema of RDF data
“Author”
inproc1
creator
inproceeding
author4
inproc2
type
inproceeding
author2
“BBB”
inproc3
inproceeding
author3
“CCC”
conf1
conf2
Conference
“conference1”
“2010”
webpage1
ID type creator title partOf
inproc1 inproceeding {author3,
author4}
“AAA” conf1
inproc2 inproceeding author2 “BBB” conf1
inproc3 inproceeding author3 “CCC” conf2
ID type title issued
conf1 Conference “conference1” 2010
conf2 Proceedings “conference2” 2011
Foreign Key Relationship
Irregularity
conf2
“content.php”
“Table Cont.”
“index.php”
“index.php”
“content.php”
Original RDF graph Self-organizing RDF representation
Example of structure recognized from RDF graph
Relational Schema Semantic Web Schema
Describes the structure of the occurring data Purpose: knowledge representation
Concept mixing (for convenience) Describing a concept universe (regardless data)
Designed for one database (=dataset) Designed for interoperability in many contexts
Statement: it is useful to have both an (Emergent) Relational and Semantic Schema for RDF data
 useful for systems (higher efficiency)
 useful for humans (easier query formulation)
What does “schema” mean?
 Compact Schema
• as few tables as possible
• homogeneous literal types (few NULLs in the tables)
 Human-friendly “Labels”
• URIs + human-understandable table/column/relationship names
 High “Coverage”
• the schema should match almost all triples in the dataset
 Efficient to compute
• as fast as data import
When is a Emergent Schema of RDF data useful?
Basic CS
discovery
Characteristic Sets in some well-known RDF datasets
Partial and Mixed Use of Ontologies
Emerging a Relational Schema
CS2
CS2
S P O
...
1. Extract
basic CS’s
4. Schema
Filtering
Triple table
3. Merge
similar CS’s
5. Instance
Filtering
Physical relational
schema
Basic CS’s
CS4
CS0CS5
CS2 CS3
CS1
label1
label4
label5
Label3
CS4
CS0CS5
CS2 CS3
CS5
Merged CS’s
CS0CS5
CS7
CS6
CS6
CS7
CS5
Merged CS’s
CS0CS5
CS7
CS6
2. Labeling
Parameter Tuning
Results: compact schemas with high coverage
Results: understandable labels & performance
Likert Score: 1=bad ….. 5=excellent
Deriving an Emergent Relational Schema from RDF Data

More Related Content

PDF
Managing RDF data with graph databases
PPTX
RDF Graph Data Management in Oracle Database and NoSQL Platforms
PDF
RDF Seminar Presentation
PDF
Sparql a simple knowledge query
PPT
Analysis on semantic web layer cake entities
PDF
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
PDF
Semantic Web Technology
ODP
RDF and the Semantic Web -- Joanna Pszenicyn
Managing RDF data with graph databases
RDF Graph Data Management in Oracle Database and NoSQL Platforms
RDF Seminar Presentation
Sparql a simple knowledge query
Analysis on semantic web layer cake entities
RDF4U: RDF Graph Visualization by Interpreting Linked Data as Knowledge
Semantic Web Technology
RDF and the Semantic Web -- Joanna Pszenicyn

What's hot (20)

PDF
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
PDF
Resource description framework
ODP
LODStats (Presentation for KESW2013 System Demo)
PDF
Graph basedrdf storeforapachecassandra
PPT
Achieving time effective federated information from scalable rdf data using s...
PPTX
Scalable Web Data Management using RDF
PDF
Evolution of the Graph Schema
PDF
Linked Data Experiences at Springer Nature
PDF
Analysing Structured Scholarly Data Embedded in Web Pages
PPTX
Hack U Barcelona 2011
PPT
Semantic Pipes and Semantic Mashups
PPTX
Presentation of the INVENiT Expert Meeting on Monday 16 February 2015
PPT
Graph database
PPT
Rdf And Rdf Schema For Ontology Specification
PDF
Graph Database
PPTX
Semantic web for ontology chapter4 bynk
PDF
DBPedia-past-present-future
PPT
Lodlam saa 2011_jenelfarrell_2
PPTX
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
Resource description framework
LODStats (Presentation for KESW2013 System Demo)
Graph basedrdf storeforapachecassandra
Achieving time effective federated information from scalable rdf data using s...
Scalable Web Data Management using RDF
Evolution of the Graph Schema
Linked Data Experiences at Springer Nature
Analysing Structured Scholarly Data Embedded in Web Pages
Hack U Barcelona 2011
Semantic Pipes and Semantic Mashups
Presentation of the INVENiT Expert Meeting on Monday 16 February 2015
Graph database
Rdf And Rdf Schema For Ontology Specification
Graph Database
Semantic web for ontology chapter4 bynk
DBPedia-past-present-future
Lodlam saa 2011_jenelfarrell_2
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Ad

Viewers also liked (10)

PDF
Search-Based Testing of Relational Schema Integrity Constraints Across Multip...
ODP
MongoDB - javascript for your data
PPTX
MongoDB
PPTX
PDF
Sql Injection Myths and Fallacies
PPT
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
PPTX
ER model to Relational model mapping
PPT
ER DIAGRAM TO RELATIONAL SCHEMA MAPPING
PPT
Erd examples
Search-Based Testing of Relational Schema Integrity Constraints Across Multip...
MongoDB - javascript for your data
MongoDB
Sql Injection Myths and Fallacies
Database Normalization 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
ER model to Relational model mapping
ER DIAGRAM TO RELATIONAL SCHEMA MAPPING
Erd examples
Ad

Similar to Deriving an Emergent Relational Schema from RDF Data (20)

PDF
IRJET- Data Retrieval using Master Resource Description Framework
PDF
Rdf data-model-and-storage
PPTX
Linked Data Modeling for Beginner
PPT
Semantic Web - RDF
PDF
Short Report Bridges performance gap between Relational and RDF
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
Mining and Managing Large-scale Linked Open Data
PPT
Introduction To RDF and RDFS
PDF
Find your way in Graph labyrinths
PPTX
Schema less table & dynamic schema
PDF
Graph databases & data integration v2
PDF
Semantic Web talk TEMPLATE
PPT
ontology.ppt
PDF
semanticweb
PPT
RDF briefing
PDF
INTEGRATING RELATED XML DATA INTO MULTIPLE DATA WAREHOUSE SCHEMAS
PPT
Semantic web
PPTX
RDFa Tutorial
PDF
Data integration with a façade. The case of knowledge graph construction.
PPTX
RDF Data Model
IRJET- Data Retrieval using Master Resource Description Framework
Rdf data-model-and-storage
Linked Data Modeling for Beginner
Semantic Web - RDF
Short Report Bridges performance gap between Relational and RDF
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
Introduction To RDF and RDFS
Find your way in Graph labyrinths
Schema less table & dynamic schema
Graph databases & data integration v2
Semantic Web talk TEMPLATE
ontology.ppt
semanticweb
RDF briefing
INTEGRATING RELATED XML DATA INTO MULTIPLE DATA WAREHOUSE SCHEMAS
Semantic web
RDFa Tutorial
Data integration with a façade. The case of knowledge graph construction.
RDF Data Model

More from Graph-TA (20)

PDF
Computing on Event-sourced Graphs
PDF
Using Evolutionary Computing for Feature-driven Graph generation
PDF
Reactive Databases for Big Data applications
PDF
The scarcity of crossing dependencies: a direct outcome of a specific constra...
PDF
Holistic Benchmarking of Big Linked Data: HOBBIT
PDF
Identifiability in Dynamic Casual Networks
PDF
Polyglot Graph Databases using OCL as pivot
PDF
Benchmarking Versioning for Big Linked Data
PDF
Synthetic Data Generation using exponential random Graph modeling
PDF
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
PDF
Graphalytics: A big data benchmark for graph-processing platforms
PDF
Modelling the Clustering Coefficient of a Random graph
PPTX
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
PPTX
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
PDF
Graphalytics: A big data benchmark for graph processing platforms
PDF
Autograph: an evolving lightweight graph tool
PPTX
Understanding Graph Structure in Knowledge Bases
PDF
Finding patterns of chronic disease and medication prescriptions from a large...
PDF
Recent Updates on IBM System G — GraphBIG and Temporal Data
PDF
Analysing the degree distribution of real graphs by means of several probabil...
Computing on Event-sourced Graphs
Using Evolutionary Computing for Feature-driven Graph generation
Reactive Databases for Big Data applications
The scarcity of crossing dependencies: a direct outcome of a specific constra...
Holistic Benchmarking of Big Linked Data: HOBBIT
Identifiability in Dynamic Casual Networks
Polyglot Graph Databases using OCL as pivot
Benchmarking Versioning for Big Linked Data
Synthetic Data Generation using exponential random Graph modeling
Use of Graphs for Cloud Service Selection in Multi-Cloud Environments
Graphalytics: A big data benchmark for graph-processing platforms
Modelling the Clustering Coefficient of a Random graph
GRAPHITE — An Extensible Graph Traversal Framework for RDBMS
On the Discovery of Novel Drug-Target Interactions from Dense SubGraphs
Graphalytics: A big data benchmark for graph processing platforms
Autograph: an evolving lightweight graph tool
Understanding Graph Structure in Knowledge Bases
Finding patterns of chronic disease and medication prescriptions from a large...
Recent Updates on IBM System G — GraphBIG and Temporal Data
Analysing the degree distribution of real graphs by means of several probabil...

Recently uploaded (20)

PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
UiPath Agentic Automation session 1: RPA to Agents
DOCX
search engine optimization ppt fir known well about this
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
STKI Israel Market Study 2025 version august
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Configure Apache Mutual Authentication
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PPTX
The various Industrial Revolutions .pptx
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
A proposed approach for plagiarism detection in Myanmar Unicode text
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PDF
Developing a website for English-speaking practice to English as a foreign la...
Module 1.ppt Iot fundamentals and Architecture
UiPath Agentic Automation session 1: RPA to Agents
search engine optimization ppt fir known well about this
sustainability-14-14877-v2.pddhzftheheeeee
Two-dimensional Klein-Gordon and Sine-Gordon numerical solutions based on dee...
A contest of sentiment analysis: k-nearest neighbor versus neural network
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
STKI Israel Market Study 2025 version august
Custom Battery Pack Design Considerations for Performance and Safety
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Configure Apache Mutual Authentication
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
The various Industrial Revolutions .pptx
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
A proposed approach for plagiarism detection in Myanmar Unicode text
Getting started with AI Agents and Multi-Agent Systems
Credit Without Borders: AI and Financial Inclusion in Bangladesh
Convolutional neural network based encoder-decoder for efficient real-time ob...
Developing a website for English-speaking practice to English as a foreign la...

Deriving an Emergent Relational Schema from RDF Data

  • 1. Deriving an Emergent Relational Schema from RDF Data Minh-Duc Pham Linnea Passing Orri Erling Peter Boncz duc@cwi.nl linnea.passing@tum.de oerling@openlinksw.com boncz@cwi.nl
  • 2.  Bad query plans  Low storage locality  Lack of user schema insight Main Problems in RDF Data Management [1] [1] Minh-Duc Pham, Boncz P.A., " Self-organizing Structured RDF in MonetDB ," PhD Symposium, ICDE, 2013 RDF de-emphasizes the need for a schema and the notion of structure in the data Emergent Schema
  • 3. Emergent schema = “rough” schema to which the majority of triples conforms Recognize:  Classes (Characteristic Sets - CS’s ) – recognize “classes” of often co-occurring properties  Relationships (CS) – recognize often-occurring references between such classes + give logical names to these <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://rdfs.org/sioc/ns#num_replies> <http://purl.org/dc/terms/title> <http://rdfs.org/sioc/ns#has_creator> <http://purl.org/dc/terms/date> <http://purl.org/dc/terms/created> <http://purl.org/rss/1.0/modules/content/encoded> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/name> <http://xmlns.com/foaf/0.1/page> <has_creator> “Book” Recovering the Emergent Schema of RDF data “Author”
  • 4. inproc1 creator inproceeding author4 inproc2 type inproceeding author2 “BBB” inproc3 inproceeding author3 “CCC” conf1 conf2 Conference “conference1” “2010” webpage1 ID type creator title partOf inproc1 inproceeding {author3, author4} “AAA” conf1 inproc2 inproceeding author2 “BBB” conf1 inproc3 inproceeding author3 “CCC” conf2 ID type title issued conf1 Conference “conference1” 2010 conf2 Proceedings “conference2” 2011 Foreign Key Relationship Irregularity conf2 “content.php” “Table Cont.” “index.php” “index.php” “content.php” Original RDF graph Self-organizing RDF representation Example of structure recognized from RDF graph
  • 5. Relational Schema Semantic Web Schema Describes the structure of the occurring data Purpose: knowledge representation Concept mixing (for convenience) Describing a concept universe (regardless data) Designed for one database (=dataset) Designed for interoperability in many contexts Statement: it is useful to have both an (Emergent) Relational and Semantic Schema for RDF data  useful for systems (higher efficiency)  useful for humans (easier query formulation) What does “schema” mean?
  • 6.  Compact Schema • as few tables as possible • homogeneous literal types (few NULLs in the tables)  Human-friendly “Labels” • URIs + human-understandable table/column/relationship names  High “Coverage” • the schema should match almost all triples in the dataset  Efficient to compute • as fast as data import When is a Emergent Schema of RDF data useful?
  • 8. Characteristic Sets in some well-known RDF datasets
  • 9. Partial and Mixed Use of Ontologies
  • 10. Emerging a Relational Schema CS2 CS2 S P O ... 1. Extract basic CS’s 4. Schema Filtering Triple table 3. Merge similar CS’s 5. Instance Filtering Physical relational schema Basic CS’s CS4 CS0CS5 CS2 CS3 CS1 label1 label4 label5 Label3 CS4 CS0CS5 CS2 CS3 CS5 Merged CS’s CS0CS5 CS7 CS6 CS6 CS7 CS5 Merged CS’s CS0CS5 CS7 CS6 2. Labeling Parameter Tuning
  • 11. Results: compact schemas with high coverage
  • 12. Results: understandable labels & performance Likert Score: 1=bad ….. 5=excellent