SlideShare a Scribd company logo
1
Translation of Relational and
Non-Relational Databases
into RDF with xR2RML
F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat
I3S lab, CNRS, Univ. Nice Sophia
2
 Web of data  publication/interlinking of open datasets
• Goal: publish heterogeneous data in a common format (RDF)
 Driven by data integration initiatives, e.g.:
• Linking Open Data, 1015 ds.
• W3C Data Activity
• BIO2RDF, 35 ds.
• Neuroscience Information
Framework
(12598 registry entries)
Web-scale data integration
Linked Datasets as of Aug. 30th 2014.
(c) R. Cyganiak & and A. Jentzsch
(Data: Apr. 2015)
3
Web-scale data integration
 Need to access data from the Deep Web [1]
• Strd./unstrd. data
hardly indexed by search engines,
hardly linked with other data sources
 Exponential data growth goes on
• Various types of DBs:
RDB, NoSQL, NewSQL, Native XML,
LDAP directory, OODB...
• Heterogeneous data models and
query capabilities
[1] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the deep web. Communications of the ACM, 50(5):94–101, 2007
4
Web-scale data integration
To enrich the web of data with
existing and new data being created
ever faster...
... we need standardized approaches
to enable the translation of
heterogeneous data sources to RDF
5
 Previous works
 Background: R2RML and RML
 Description of xR2RML
 Evaluation and perspectives
Agenda
6
 Previous works
 Background: R2RML and RML
 Description of xR2RML
 Evaluation and perspectives
Agenda
7
 Much work achieved on RDBs
D2RQ, Virtuoso, R2RML (W3C)…
Goals: generic RDB-to-RDF, OBDA, ontology learning, schema mapping…
Methods: direct mapping vs. domain-specific,
materialization vs. SQL-to-SPARQL query rewriting
 XML: using either XPath (RML), XQuery (XSPARQL,
SPARQL2XQuery) or XSLT (Scissor-Lift), XSD-to-OWL
(SPARQL2XQuery)
 CSV/TSV/Spreadsheets: CSV on the web (W3C WG)
 JSON: using JSONPath (RML)
 Integration frameworks: DataLift, RML, Asio Tool Suite…
Previous works
8
 Existing approaches to map specific types of databases or
map specific data formats to RDF
 Each comes with its own mapping language or UI
 Supporting a new system (data model and QL) not
straightforward
Previous works
No unified mapping language to equally apply to most common
databases (RDB, NoSQL, XML, LDAP, OO…)
Supporting a new data model and/or QL  develop a DB
connector but no change in the mapping language
9
 Previous works
 Background: R2RML and RML
 Description of xR2RML
 Evaluation and perspectives
Agenda
10
R2RML – RDB To RDF Mapping Language
 W3C recommendation, 2012
 Goals:
• Describe mappings of relational entities to RDF
• Reuse of existing ontologies
• Operationalization not addressed
 How: TriplesMaps (TM) define how to generate RDF triples
• 1 logical table  rows to process
• 1 subject map  subject IRIs
• N (predicate map-object map) couples
• 1 opt. graph map  graph IRIs
 An R2RML mapping is an RDF graph
Triples
11
R2RML – RDB To RDF Mapping Language
Id Acronym Centre_Id
10 CAC2010 4
Id Name address
4 Pasteur ...
Study
Centre
FK
R2RML mapping graph:
Produced RDF:
<#Centre> a rr:TriplesMap;
rr:logicalTable [ rr:tableName "Centre" ];
rr:subjectMap [ rr:class ex:Centre;
rr:template "http://example.org/centre#{Name}"; ].
<#Study> a rr:TriplesMap;
rr:logicalTable [ rr:tableName “Study" ];
rr:subjectMap [ rr:class ex:Study;
rr:template "http://example.org/study#{Id}"; ];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [ rr:column "Acronym" ]; ];
rr:predicateObjectMap [
rr:predicate ex:locatedIn;
rr:objectMap [
rr:parentTriplesMap <#Centre>;
rr:joinCondition [
rr:child "Centre_id";
rr:parent "Id";
]; ]; ].
<http://example.org/centre#Pasteur> a ex:Centre.
<http://example.org/study#10> a ex:Study;
ex:hasName "CAC2010";
ex:locatedIn <http://example.org/centre#Pasteur>.
12
<#Centre>
rml:logicalSource [
rml:source “http://example.org/Centres.xml";
rml:referenceFormulation ql:XPath;
rml:iterator “/centres/centre”:
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
"http://example.org/centre#{//centre/@Id}";
];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [
rml:reference "//centre/name" ];
];
RML extensions to R2RML
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
Advantages:
• Extends to CSV, JSON, XML sources
• Map several sources simultaneously
Limitations:
• Fixed list of reference formulations
• No distinction between reference
formulation and query language
• No RDF collections
RML mapping graph:XML document:
13
 Previous works
 Background: R2RML and RML
 Description of xR2RML
 Evaluation and perspectives
Agenda
14
xR2RML - Overall picture
xR2RML
Translation
Engine
xR2RML
Mapping
description
Native QL
Source database
Flexible language to describe mappings from
most common types of DB to RDF.
Extends R2RML and leverages RML extensions.
Domain
ontologies
refers to
Domain
ontologies
uses
15
xR2RML: Logical source
<#Centre>
xrr:logicalSource [
xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre
where ... return $x’’’;
];
rr: R2RML vocabulary
xrr: xR2RML vocabulary
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
XML database
supporting XQuey:
xR2RML mapping graph:
16
xR2RML: Data element references
<#Centre>
xrr:logicalSource [
xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre
where ... return $x’’’;
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
"http://example.org/centre#{//centre/@Id}";
];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [
xrr:reference "//centre/name" ];
];
rr: R2RML vocabulary
xrr: xR2RML vocabulary
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
XML database
supporting XQuey:
xR2RML mapping graph:
17
xR2RML: Data element references
<centres>
<centre @Id="4">
<name>Pasteur</name>
</centre>
<centre @Id="6">
<name>Pontchaillou</name>
</centre>
</centres>
XML database
supporting XQuey:
xR2RML mapping graph:
rr: R2RML vocabulary
xrr: xR2RML vocabulary
<#Centre>
xrr:logicalSource [
xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre
where ... return $x’’’;
];
rr:subjectMap [
rr:class ex:Centre;
rr:template
"http://example.org/centre#{//centre/@Id}";
];
rr:predicateObjectMap [
rr:predicate ex:hasName;
rr:objectMap [
xrr:reference “//centre/name" ];
];
xR2RML engine usage guidelines
Types of DB xrr:query
xrr:reference
rr:template
RDB, Column
stores
SQL, CQL, HQL Column name
Native XML DB XQuery XPath
NoSQL doc. Store Proprietary JS-based JSONPath
SPARQL endpoint SPARQL
Variable name,
Column name (s, p, o)
Neo4J (graph db) Cypher Column name (s, p, o)
LDAP directory LDAP Query Attribute name
... ... ...
18
{ "studyid": 10,
"acronym": "CAC2010",
"centres": [
{ "centreid": 4, "name": "Pasteur" },
{ "centreid": 6, "name": "Pontchaillou" }
]
}
xR2RML: multiple values vs. RDF list/container
Mapping case: link the study
with the centres it involves
<http://example.org/study#10> ex:involves “Pasteur”.
<http://example.org/study#10> ex:involves “Pontchaillou”.
<http://example.org/study#10>
ex:involvesCenters ( “Pasteur” “Pontchaillou” )
19
{ "studyid": 10,
"acronym": "CAC2010",
"centres": [
{ "centreid": 4, "name": "Pasteur" },
{ "centreid": 6, "name": "Pontchaillou" }
]
}
xR2RML: multiple values vs. RDF list/container
Mapping case: link the study
with the centres it involves
rr:objectMap [
xrr:reference "$.centres.*.name“;
rr:termType xrr:RdfList;
];
R2RML
term types
rr:IRI,
rr:Literal,
rr:BlankNode
xR2RML
term types
xrr:RdfList,
xrr:RdfSeq,
xrr:RdfBag,
xrr:RdfAlt
20
xR2RML: nested collections
From structured values (XML, JSON...):
nested collections and key-value associations...
... to RDF:
 generate nested lists/containers,
qualify members (data type,
language tag...)
rr:objectMap [
xrr:reference “...";
rr:termType xrr:RdfList;
xrr:nestedTermMap [
xrr:reference “...";
rr:termType xrr:RdfList;
xrr:nestedTermMap [
rr:datatype xsd:string;
]; ]; ];
(
( “John”^^xsd:string “Bob”^^xsd:string )
( “Ted”^^xsd:string “Mark”^^xsd:string )
)
E.g.: produce a list of lists of strings
21
Collection “studies”:
{ “studyid”: 10,
“acronym”: “CAC2010”,
“centres”: [ 4, 6 ]
}
Collection “centres”:
{ “centreid”: 4,
“name”: “Pasteur” },
{ “centreid”: 6,
“name”: “Pontchaillou”}
xR2RML: cross-references
<#Centre>
xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].
<#Study>
xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];
rr:predicateObjectMap [
rr:predicate ex:involvesSeq;
rr:objectMap [
rr:parentTriplesMap <#Centre>;
rr:joinCondition [
rr:child "$.centres.*";
rr:parent "$.centreid";
];
rr:termType xrr:RdfSeq;
];
].
<http://example.org/study#10> ex:involvesSeq
[ a rdf:Seq;
rdf:_1 <http://example.org/centre#Pasteur>;
rdf:_2 <http://example.org/centre#Pontchaillou>; ].
xR2RML mapping graph:MongoDB database:
Produced RDF:
22
Collection “studies”:
{ “studyid”: 10,
“acronym”: “CAC2010”,
“centres”: [ 4, 6 ]
}
Collection “centres”:
{ “centreid”: 4,
“name”: “Pasteur” },
{ “centreid”: 6,
“name”: “Pontchaillou”}
xR2RML: cross-references
<#Centre>
xrr:logicalSource [ ... ]; rr:subjectMap [ ... ].
<#Study>
xrr:logicalSource [ .. ]; rr:subjectMap [ ... ];
rr:predicateObjectMap [
rr:predicate ex:involvesSeq;
rr:objectMap [
rr:parentTriplesMap <#Centre>;
rr:joinCondition [
rr:child "$.centres.*";
rr:parent "$.centreid";
];
rr:termType xrr:RdfSeq;
];
].
xR2RML mapping graph:MongoDB database:
Joint query pushed to the DB
if supported, performed by
the xR2RML engine otherwise
<http://example.org/study#10> ex:involvesSeq
[ a rdf:Seq;
rdf:_1 <http://example.org/centre#Pasteur>;
rdf:_2 <http://example.org/centre#Pontchaillou>; ].
Produced RDF:
23
<#Centre>
xrr:logicalSource [
xrr:sourceName "STAFF";
];
...
rr:predicateObjectMap [
rr:predicate ex:fist-name;
rr:objectMap [
xrr:reference
"Column(Name)/JSONPath($.FirstName)" ];
];
xR2RML: content with mixed formats
Data with mixed content
Relational table “STAFF”, column “Name”
contains JSON data:
... Name ...
... {
“FirstName”: “Bob”,
“LastName: “Smith”
}
...
xR2RML mapping graph:
24
<#Centre>
xrr:logicalSource [
xrr:sourceName "STAFF";
];
...
rr:predicateObjectMap [
rr:predicate ex:fist-name;
rr:objectMap [
xrr:reference
"Column(Name)/JSONPath($.FirstName)" ];
];
xR2RML: content with mixed formats
Data with mixed content
Relational table “STAFF”, column “Name”
contains JSON data:
... Name ...
... {
“FirstName”: “Bob”,
“LastName: “Smith”
}
...
Data
format
Syntax path constructor
Row Column(), CSV(), TSV()
XML XPath()
JSON JSONPath()
... ...
xR2RML mapping graph:
25
 Previous works
 Background: R2RML and RML
 Description of xR2RML main features
 Evaluation and perspectives
Agenda
26
 Use case: study the history and transmission of
zoological knowledge
along historical periods
 TAXREF taxonomical reference
• Designed to support studies in Conservation Biology, enriched
with bioarchaeological taxa
• Maintained the French National Museum of Natural History
• ~ 450.000 terms, CSV/JSON/XML
Use case in Digital Humanities
27
 Ongoing work [2]: Construction of a SKOS1 thesaurus based
on TAXREF
• Import of TAXREF/JSON into MongoDB
• Use of the Morph-xR2RML prototype implementation of
xR2RML, to convert the MongoDB data to RDF
• Make alignments with existing well-adopted ontologies
(e.g. NCBI Taxonomic Classification, GeoNames...)
• Static alignments at mapping design time
• Using automatic alignment methods
Use case in Digital Humanities
1 SKOS: Simple Knowledge Organization System, W3C RDF-based standard to represent controlled
vocabularies, taxonomies and thesauri. Bridge the gap between existing KOS and the Semantic Web
and Linked Data.
28
 Ongoing discussion about the use of
xR2RML to support ecology and
agronomic studies
• Large phenotype databases
 Consider the query rewriting approach to support large
datasets
 How to write xR2RML mappings
• Automatic xR2RML mapping generation from data schema
(XSD/DTD, JSON schema, JSON-LD...)
• Schema mapping
• Schema discovery
Perspectives
29
Conclusions
 Data deluge keeps on ever faster
 Data stored in many kinds of DBs
 xR2RML:
• Flexible language to map most common types of database to
RDF
• Supports various data models and query languages
• Rich features: RDF collections/containers, joins, content with
mixed formats
 Applied to the construction of a SKOS thesaurus of
TAXREF, a taxonomical reference
30
Contacts:
Franck Michel
Johan Montagnat
Catherine Faron-Zucker
[2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for
Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15.
[3] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. xR2RML: Non-Relational Databases to RDF
Mapping Language. Research report. ISRN I3S/RR 2014-04-FR. http://hal.archives-ouvertes.fr/hal-01066663
https://github.com/frmichel/morph-xr2rml/

More Related Content

PDF
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
PPTX
Integrating Heterogeneous Data Sources in the Web of Data
PPTX
SWT Lecture Session 2 - RDF
PPTX
Expanding the content categories at JaLC
PDF
An Introduction to RDF and the Web of Data
PDF
Uplift – Generating RDF datasets from non-RDF data with R2RML
PDF
20160818 Semantics and Linkage of Archived Catalogs
PDF
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...
DistLODStats: Distributed Computation of RDF Dataset Statistics - ISWC 2018 talk
Integrating Heterogeneous Data Sources in the Web of Data
SWT Lecture Session 2 - RDF
Expanding the content categories at JaLC
An Introduction to RDF and the Web of Data
Uplift – Generating RDF datasets from non-RDF data with R2RML
20160818 Semantics and Linkage of Archived Catalogs
NISO/DCMI Webinar: International Bibliographic Standards, Linked Data, and th...

What's hot (19)

PPTX
RDF data model
PDF
Web Data Management with RDF
PDF
Linked (Open) Data
PDF
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
PPTX
SWT Lecture Session 10 R2RML Part 1
ODP
2009 0807 Lod Gmod
PPT
Achieving time effective federated information from scalable rdf data using s...
PPTX
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
PPTX
SWT Lecture Session 9 - RDB2RDF direct mapping
PDF
Scaling the (evolving) web data –at low cost-
PPTX
SWT Lecture Session 3 - SPARQL
PPTX
SWT Lecture Session 8 - Rules
PPTX
SWT Lecture Session 11 - R2RML part 2
PPTX
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
PPTX
Democratizing Big Semantic Data management
PDF
XSPARQL CrEDIBLE workshop
PPTX
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
RDF data model
Web Data Management with RDF
Linked (Open) Data
A Generic Mapping-based Query Translation from SPARQL to Various Target Datab...
SWT Lecture Session 10 R2RML Part 1
2009 0807 Lod Gmod
Achieving time effective federated information from scalable rdf data using s...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
SWT Lecture Session 9 - RDB2RDF direct mapping
Scaling the (evolving) web data –at low cost-
SWT Lecture Session 3 - SPARQL
SWT Lecture Session 8 - Rules
SWT Lecture Session 11 - R2RML part 2
Eureka Research Workbench: A Semantic Approach to an Open Source Electroni...
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Democratizing Big Semantic Data management
XSPARQL CrEDIBLE workshop
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Ad

Viewers also liked (17)

PPT
Who wants to be a millionaire game chemistry vocab
PPTX
Test PowerPoint C1
PDF
Kanchi Periva Forum - Ebook # 18 - Navaratri 2013 - Kamakshi
PPTX
sejarah komputer beserta hardware dan software
PPTX
Alopez powerpoint
PPTX
Question 5.
PDF
Sandra Roe Visual Resume
PDF
Kanchi Periva Forum Newsletter - Volume 3
PDF
PDF
Kursintroduktion entreprenörskap
DOC
Partea 1
PPTX
Landscape
PPTX
The making of Lys-de-Membre - part 4
PDF
Trabajator presentation
PPTX
Presentation1
PDF
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
PPT
Angela ajo
Who wants to be a millionaire game chemistry vocab
Test PowerPoint C1
Kanchi Periva Forum - Ebook # 18 - Navaratri 2013 - Kamakshi
sejarah komputer beserta hardware dan software
Alopez powerpoint
Question 5.
Sandra Roe Visual Resume
Kanchi Periva Forum Newsletter - Volume 3
Kursintroduktion entreprenörskap
Partea 1
Landscape
The making of Lys-de-Membre - part 4
Trabajator presentation
Presentation1
A New Approach: Automatically Identify Proper Noun from Bengali Sentence for ...
Angela ajo
Ad

Similar to Translation of Relational and Non-Relational Databases into RDF with xR2RML (20)

PPTX
Relational Database to RDF (RDB2RDF)
PPTX
Semantic Web and Related Work at W3C
PPTX
Transient and persistent RDF views over relational databases in the context o...
PDF
Adventures in Linked Data Land (presentation by Richard Light)
PDF
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
PDF
A Hands On Overview Of The Semantic Web
PDF
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
PDF
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
PDF
A Review on RDB to RDF Mapping for Semantic Web
PDF
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PPTX
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
PDF
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
PPTX
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
PPTX
Enterprise knowledge graphs
ODP
Data Integration And Visualization
PDF
Danbri Drupalcon Export
PDF
RDB2RDF, an overview of R2RML and Direct Mapping
PPTX
20100614 ISWSA Keynote
PPT
Linked Data Tutorial
Relational Database to RDF (RDB2RDF)
Semantic Web and Related Work at W3C
Transient and persistent RDF views over relational databases in the context o...
Adventures in Linked Data Land (presentation by Richard Light)
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
A Hands On Overview Of The Semantic Web
A Generic Language for Integrated RDF Mappings of Heterogeneous Data
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
A Review on RDB to RDF Mapping for Semantic Web
A REVIEW ON RDB TO RDF MAPPING FOR SEMANTIC WEB
RDF-Gen: Generating RDF from streaming and archival data
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
Machine-Interpretable Dataset and Service Descriptions for Heterogeneous Data...
RDB2RDF Tutorial (R2RML and Direct Mapping) at ISWC 2013
Enterprise knowledge graphs
Data Integration And Visualization
Danbri Drupalcon Export
RDB2RDF, an overview of R2RML and Direct Mapping
20100614 ISWSA Keynote
Linked Data Tutorial

More from Franck Michel (14)

PDF
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
PDF
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
PDF
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
PDF
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
PDF
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
PDF
Knowledge Engineering: Semantic web, web of data, linked data
PDF
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
PDF
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
PDF
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
PDF
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
PDF
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
PDF
A Mapping-based Method to Query MongoDB Documents with SPARQL
PDF
Make our Scientific Datasets Accessible and Interoperable on the Web
PDF
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...
ISSA: Generic Pipeline, Knowledge Model and Visualization tools to Help Scien...
Bioschemas: Marking up biodiversity websites to improve data discovery and we...
Unleash the Potential of your Website! 180,000 webpages from the French NHM m...
Heterogeneous Data Aggregation and Querying at Web Scale Using Semantic align...
Describe and Publish data sets on the web: vocabularies, catalogues, data por...
Knowledge Engineering: Semantic web, web of data, linked data
Enabling Automatic Discovery and Querying of Web APIs at Web Scale using Link...
Modelling Biodiversity Linked Data: Pragmatism May Narrow Future Opportunities
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...
SPARQL Micro-Services: Lightweight Integration of Web APIs and Linked Data
Construction d’un référentiel taxonomique commun pour des études sur l’histoi...
A Mapping-based Method to Query MongoDB Documents with SPARQL
Make our Scientific Datasets Accessible and Interoperable on the Web
Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archa...

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
An interstellar mission to test astrophysical black holes
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
2. Earth - The Living Planet earth and life
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
ECG_Course_Presentation د.محمد صقران ppt
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Biophysics 2.pdffffffffffffffffffffffffff
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
An interstellar mission to test astrophysical black holes
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Taita Taveta Laboratory Technician Workshop Presentation.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
POSITIONING IN OPERATION THEATRE ROOM.ppt
Placing the Near-Earth Object Impact Probability in Context
2. Earth - The Living Planet earth and life
Phytochemical Investigation of Miliusa longipes.pdf
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Classification Systems_TAXONOMY_SCIENCE8.pptx
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Cell Membrane: Structure, Composition & Functions
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
ECG_Course_Presentation د.محمد صقران ppt

Translation of Relational and Non-Relational Databases into RDF with xR2RML

  • 1. 1 Translation of Relational and Non-Relational Databases into RDF with xR2RML F. Michel, L. Djimenou, C. Faron-Zucker, J. Montagnat I3S lab, CNRS, Univ. Nice Sophia
  • 2. 2  Web of data  publication/interlinking of open datasets • Goal: publish heterogeneous data in a common format (RDF)  Driven by data integration initiatives, e.g.: • Linking Open Data, 1015 ds. • W3C Data Activity • BIO2RDF, 35 ds. • Neuroscience Information Framework (12598 registry entries) Web-scale data integration Linked Datasets as of Aug. 30th 2014. (c) R. Cyganiak & and A. Jentzsch (Data: Apr. 2015)
  • 3. 3 Web-scale data integration  Need to access data from the Deep Web [1] • Strd./unstrd. data hardly indexed by search engines, hardly linked with other data sources  Exponential data growth goes on • Various types of DBs: RDB, NoSQL, NewSQL, Native XML, LDAP directory, OODB... • Heterogeneous data models and query capabilities [1] B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the deep web. Communications of the ACM, 50(5):94–101, 2007
  • 4. 4 Web-scale data integration To enrich the web of data with existing and new data being created ever faster... ... we need standardized approaches to enable the translation of heterogeneous data sources to RDF
  • 5. 5  Previous works  Background: R2RML and RML  Description of xR2RML  Evaluation and perspectives Agenda
  • 6. 6  Previous works  Background: R2RML and RML  Description of xR2RML  Evaluation and perspectives Agenda
  • 7. 7  Much work achieved on RDBs D2RQ, Virtuoso, R2RML (W3C)… Goals: generic RDB-to-RDF, OBDA, ontology learning, schema mapping… Methods: direct mapping vs. domain-specific, materialization vs. SQL-to-SPARQL query rewriting  XML: using either XPath (RML), XQuery (XSPARQL, SPARQL2XQuery) or XSLT (Scissor-Lift), XSD-to-OWL (SPARQL2XQuery)  CSV/TSV/Spreadsheets: CSV on the web (W3C WG)  JSON: using JSONPath (RML)  Integration frameworks: DataLift, RML, Asio Tool Suite… Previous works
  • 8. 8  Existing approaches to map specific types of databases or map specific data formats to RDF  Each comes with its own mapping language or UI  Supporting a new system (data model and QL) not straightforward Previous works No unified mapping language to equally apply to most common databases (RDB, NoSQL, XML, LDAP, OO…) Supporting a new data model and/or QL  develop a DB connector but no change in the mapping language
  • 9. 9  Previous works  Background: R2RML and RML  Description of xR2RML  Evaluation and perspectives Agenda
  • 10. 10 R2RML – RDB To RDF Mapping Language  W3C recommendation, 2012  Goals: • Describe mappings of relational entities to RDF • Reuse of existing ontologies • Operationalization not addressed  How: TriplesMaps (TM) define how to generate RDF triples • 1 logical table  rows to process • 1 subject map  subject IRIs • N (predicate map-object map) couples • 1 opt. graph map  graph IRIs  An R2RML mapping is an RDF graph Triples
  • 11. 11 R2RML – RDB To RDF Mapping Language Id Acronym Centre_Id 10 CAC2010 4 Id Name address 4 Pasteur ... Study Centre FK R2RML mapping graph: Produced RDF: <#Centre> a rr:TriplesMap; rr:logicalTable [ rr:tableName "Centre" ]; rr:subjectMap [ rr:class ex:Centre; rr:template "http://example.org/centre#{Name}"; ]. <#Study> a rr:TriplesMap; rr:logicalTable [ rr:tableName “Study" ]; rr:subjectMap [ rr:class ex:Study; rr:template "http://example.org/study#{Id}"; ]; rr:predicateObjectMap [ rr:predicate ex:hasName; rr:objectMap [ rr:column "Acronym" ]; ]; rr:predicateObjectMap [ rr:predicate ex:locatedIn; rr:objectMap [ rr:parentTriplesMap <#Centre>; rr:joinCondition [ rr:child "Centre_id"; rr:parent "Id"; ]; ]; ]. <http://example.org/centre#Pasteur> a ex:Centre. <http://example.org/study#10> a ex:Study; ex:hasName "CAC2010"; ex:locatedIn <http://example.org/centre#Pasteur>.
  • 12. 12 <#Centre> rml:logicalSource [ rml:source “http://example.org/Centres.xml"; rml:referenceFormulation ql:XPath; rml:iterator “/centres/centre”: ]; rr:subjectMap [ rr:class ex:Centre; rr:template "http://example.org/centre#{//centre/@Id}"; ]; rr:predicateObjectMap [ rr:predicate ex:hasName; rr:objectMap [ rml:reference "//centre/name" ]; ]; RML extensions to R2RML <centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres> Advantages: • Extends to CSV, JSON, XML sources • Map several sources simultaneously Limitations: • Fixed list of reference formulations • No distinction between reference formulation and query language • No RDF collections RML mapping graph:XML document:
  • 13. 13  Previous works  Background: R2RML and RML  Description of xR2RML  Evaluation and perspectives Agenda
  • 14. 14 xR2RML - Overall picture xR2RML Translation Engine xR2RML Mapping description Native QL Source database Flexible language to describe mappings from most common types of DB to RDF. Extends R2RML and leverages RML extensions. Domain ontologies refers to Domain ontologies uses
  • 15. 15 xR2RML: Logical source <#Centre> xrr:logicalSource [ xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre where ... return $x’’’; ]; rr: R2RML vocabulary xrr: xR2RML vocabulary <centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres> XML database supporting XQuey: xR2RML mapping graph:
  • 16. 16 xR2RML: Data element references <#Centre> xrr:logicalSource [ xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre where ... return $x’’’; ]; rr:subjectMap [ rr:class ex:Centre; rr:template "http://example.org/centre#{//centre/@Id}"; ]; rr:predicateObjectMap [ rr:predicate ex:hasName; rr:objectMap [ xrr:reference "//centre/name" ]; ]; rr: R2RML vocabulary xrr: xR2RML vocabulary <centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres> XML database supporting XQuey: xR2RML mapping graph:
  • 17. 17 xR2RML: Data element references <centres> <centre @Id="4"> <name>Pasteur</name> </centre> <centre @Id="6"> <name>Pontchaillou</name> </centre> </centres> XML database supporting XQuey: xR2RML mapping graph: rr: R2RML vocabulary xrr: xR2RML vocabulary <#Centre> xrr:logicalSource [ xrr:query ’’’for $x in doc(“centres.xml”)/centres/centre where ... return $x’’’; ]; rr:subjectMap [ rr:class ex:Centre; rr:template "http://example.org/centre#{//centre/@Id}"; ]; rr:predicateObjectMap [ rr:predicate ex:hasName; rr:objectMap [ xrr:reference “//centre/name" ]; ]; xR2RML engine usage guidelines Types of DB xrr:query xrr:reference rr:template RDB, Column stores SQL, CQL, HQL Column name Native XML DB XQuery XPath NoSQL doc. Store Proprietary JS-based JSONPath SPARQL endpoint SPARQL Variable name, Column name (s, p, o) Neo4J (graph db) Cypher Column name (s, p, o) LDAP directory LDAP Query Attribute name ... ... ...
  • 18. 18 { "studyid": 10, "acronym": "CAC2010", "centres": [ { "centreid": 4, "name": "Pasteur" }, { "centreid": 6, "name": "Pontchaillou" } ] } xR2RML: multiple values vs. RDF list/container Mapping case: link the study with the centres it involves <http://example.org/study#10> ex:involves “Pasteur”. <http://example.org/study#10> ex:involves “Pontchaillou”. <http://example.org/study#10> ex:involvesCenters ( “Pasteur” “Pontchaillou” )
  • 19. 19 { "studyid": 10, "acronym": "CAC2010", "centres": [ { "centreid": 4, "name": "Pasteur" }, { "centreid": 6, "name": "Pontchaillou" } ] } xR2RML: multiple values vs. RDF list/container Mapping case: link the study with the centres it involves rr:objectMap [ xrr:reference "$.centres.*.name“; rr:termType xrr:RdfList; ]; R2RML term types rr:IRI, rr:Literal, rr:BlankNode xR2RML term types xrr:RdfList, xrr:RdfSeq, xrr:RdfBag, xrr:RdfAlt
  • 20. 20 xR2RML: nested collections From structured values (XML, JSON...): nested collections and key-value associations... ... to RDF:  generate nested lists/containers, qualify members (data type, language tag...) rr:objectMap [ xrr:reference “..."; rr:termType xrr:RdfList; xrr:nestedTermMap [ xrr:reference “..."; rr:termType xrr:RdfList; xrr:nestedTermMap [ rr:datatype xsd:string; ]; ]; ]; ( ( “John”^^xsd:string “Bob”^^xsd:string ) ( “Ted”^^xsd:string “Mark”^^xsd:string ) ) E.g.: produce a list of lists of strings
  • 21. 21 Collection “studies”: { “studyid”: 10, “acronym”: “CAC2010”, “centres”: [ 4, 6 ] } Collection “centres”: { “centreid”: 4, “name”: “Pasteur” }, { “centreid”: 6, “name”: “Pontchaillou”} xR2RML: cross-references <#Centre> xrr:logicalSource [ ... ]; rr:subjectMap [ ... ]. <#Study> xrr:logicalSource [ .. ]; rr:subjectMap [ ... ]; rr:predicateObjectMap [ rr:predicate ex:involvesSeq; rr:objectMap [ rr:parentTriplesMap <#Centre>; rr:joinCondition [ rr:child "$.centres.*"; rr:parent "$.centreid"; ]; rr:termType xrr:RdfSeq; ]; ]. <http://example.org/study#10> ex:involvesSeq [ a rdf:Seq; rdf:_1 <http://example.org/centre#Pasteur>; rdf:_2 <http://example.org/centre#Pontchaillou>; ]. xR2RML mapping graph:MongoDB database: Produced RDF:
  • 22. 22 Collection “studies”: { “studyid”: 10, “acronym”: “CAC2010”, “centres”: [ 4, 6 ] } Collection “centres”: { “centreid”: 4, “name”: “Pasteur” }, { “centreid”: 6, “name”: “Pontchaillou”} xR2RML: cross-references <#Centre> xrr:logicalSource [ ... ]; rr:subjectMap [ ... ]. <#Study> xrr:logicalSource [ .. ]; rr:subjectMap [ ... ]; rr:predicateObjectMap [ rr:predicate ex:involvesSeq; rr:objectMap [ rr:parentTriplesMap <#Centre>; rr:joinCondition [ rr:child "$.centres.*"; rr:parent "$.centreid"; ]; rr:termType xrr:RdfSeq; ]; ]. xR2RML mapping graph:MongoDB database: Joint query pushed to the DB if supported, performed by the xR2RML engine otherwise <http://example.org/study#10> ex:involvesSeq [ a rdf:Seq; rdf:_1 <http://example.org/centre#Pasteur>; rdf:_2 <http://example.org/centre#Pontchaillou>; ]. Produced RDF:
  • 23. 23 <#Centre> xrr:logicalSource [ xrr:sourceName "STAFF"; ]; ... rr:predicateObjectMap [ rr:predicate ex:fist-name; rr:objectMap [ xrr:reference "Column(Name)/JSONPath($.FirstName)" ]; ]; xR2RML: content with mixed formats Data with mixed content Relational table “STAFF”, column “Name” contains JSON data: ... Name ... ... { “FirstName”: “Bob”, “LastName: “Smith” } ... xR2RML mapping graph:
  • 24. 24 <#Centre> xrr:logicalSource [ xrr:sourceName "STAFF"; ]; ... rr:predicateObjectMap [ rr:predicate ex:fist-name; rr:objectMap [ xrr:reference "Column(Name)/JSONPath($.FirstName)" ]; ]; xR2RML: content with mixed formats Data with mixed content Relational table “STAFF”, column “Name” contains JSON data: ... Name ... ... { “FirstName”: “Bob”, “LastName: “Smith” } ... Data format Syntax path constructor Row Column(), CSV(), TSV() XML XPath() JSON JSONPath() ... ... xR2RML mapping graph:
  • 25. 25  Previous works  Background: R2RML and RML  Description of xR2RML main features  Evaluation and perspectives Agenda
  • 26. 26  Use case: study the history and transmission of zoological knowledge along historical periods  TAXREF taxonomical reference • Designed to support studies in Conservation Biology, enriched with bioarchaeological taxa • Maintained the French National Museum of Natural History • ~ 450.000 terms, CSV/JSON/XML Use case in Digital Humanities
  • 27. 27  Ongoing work [2]: Construction of a SKOS1 thesaurus based on TAXREF • Import of TAXREF/JSON into MongoDB • Use of the Morph-xR2RML prototype implementation of xR2RML, to convert the MongoDB data to RDF • Make alignments with existing well-adopted ontologies (e.g. NCBI Taxonomic Classification, GeoNames...) • Static alignments at mapping design time • Using automatic alignment methods Use case in Digital Humanities 1 SKOS: Simple Knowledge Organization System, W3C RDF-based standard to represent controlled vocabularies, taxonomies and thesauri. Bridge the gap between existing KOS and the Semantic Web and Linked Data.
  • 28. 28  Ongoing discussion about the use of xR2RML to support ecology and agronomic studies • Large phenotype databases  Consider the query rewriting approach to support large datasets  How to write xR2RML mappings • Automatic xR2RML mapping generation from data schema (XSD/DTD, JSON schema, JSON-LD...) • Schema mapping • Schema discovery Perspectives
  • 29. 29 Conclusions  Data deluge keeps on ever faster  Data stored in many kinds of DBs  xR2RML: • Flexible language to map most common types of database to RDF • Supports various data models and query languages • Rich features: RDF collections/containers, joins, content with mixed formats  Applied to the construction of a SKOS thesaurus of TAXREF, a taxonomical reference
  • 30. 30 Contacts: Franck Michel Johan Montagnat Catherine Faron-Zucker [2] C. Callou, F. Michel, C. Faron-Zucker, C. Martin, J. Montagnat. Towards a Shared Reference Thesaurus for Studies on History of Zoology, Archaeozoology and Conservation Biology. In SW4SH workshop, ESWC’15. [3] F. Michel, L. Djimenou, C. Faron-Zucker, and J. Montagnat. xR2RML: Non-Relational Databases to RDF Mapping Language. Research report. ISRN I3S/RR 2014-04-FR. http://hal.archives-ouvertes.fr/hal-01066663 https://github.com/frmichel/morph-xr2rml/