SlideShare a Scribd company logo
Declarative Data
Transformations for
Linked Data Generation:
the case of DBpedia
Ben De Meester, Wouter Maroy, Anastasia Dimou,
Ruben Verborgh, and Erik Mannens
Ghent University – imec – IDLab, Belgium
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation: the case of DBpedia
In loving memory of the Barack Obama examples
in Semantic Web conferences
How to create Linked Barack?
dbr:
Barack_
Obama
dbp:name
dbo:birthPlace
dbp:termStart
dbp:birthDate
"Barack
Obama"@en
dbr:
Hawaii
“20-01-2009”
“04-08-1961”
…
Linked Barack is based on
schema and data
A specific case…
Source
handle WikiText
Schema transformations
use custom schema (DBpedia ontology)
Data transformations
parse manually entered input data
https://en.wikipedia.org/wiki/Barack_Obama
https://en.wikipedia.org/wiki/Leopold_II_of_Belgium
… needs a specific solution?
…
select extract transform schema transform data
https://github.com/dbpedia/extraction-framework
Data transformations
are hard-coded in the DBpedia EF
Hard-coded means
case specific
coupled with the implementation
You can’t
use the DBpedia EF for other cases
use the parsing functions outside the DBpedia EF
Declarative schema transformations
are great
Use-case independent
Decoupled from the implementation
Declarative data transformations
makes Linked Data generation 🚀🚀🚀🚀
Declarative schema transformations
(i.e., semantic annotation rules)
are great,
so why not also for data transformations?
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
Outline
The current situation
existing approaches
direct mappings | successive steps
embedded data transformations| hard-coded
disadvantages
What we provide
our approach
implementation
direct mappings | successive steps
embedded data transformations| hard-coded
From original data to RDF with minimal change
e.g., CSVW, JSON(-LD)
Restricted: No schema nor data transformations
[[Honolulu]],
[[Hawaii]],
U.S.
dbr:
Honolulu
dbr:
Hawaii
?
direct mappings | successive steps
embedded data transformations| hard-coded
First data, then schema transformations
(or vice versa)
e.g., R2RML
Restricted: depends on underlying system for
data transformations
e.g., SQL views for R2RML
Uncombinable: combine transformations?
e.g. , Born should return a date
direct mappings | successive steps
embedded data transformations| hard-coded
Tool supports limited set of data transformations
e.g., OpenRefine
Restricted: limited set of data transformations
parsing is more than
splitting a string or
one regular expression
Coupled: types of data transformations depend
on the tool
direct mappings | successive steps
embedded data transformations| hard-coded
…
select extract transform schema transform data
https://github.com/dbpedia/extraction-framework
DBpedia EF
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation: the case of DBpedia
select
{{Infobox president
|name = Barack Obama
|image = President Barack Obama.jpg
|office = President of the United States
|vicepresident = [[Joe Biden]]
|birth_place = [[Honolulu]], [[Hawaii]], U.S.
|term_start = January 20, 2009
|term_end = January 20, 2017
|birth_date = {{birth date and age|1961|8|4}}
|birth_name = Barack Hussein Obama II
…
extract
dbr:
Barack_
Obama
dbp:name
dbo:birthPlace
dbp:termStart
dbp:birthDate
[[Honolulu]],
[[Hawaii]],
U.S.
{{birth date and
age|1961|8|4}}
…
transform schema
Barack Obama
January 20, 2009
dbr:
Hawaii
dbr:
Barack_
Obama
dbp:name
dbo:birthPlace
dbp:termStart
dbp:birthDate
"Barack
Obama"@en
dbr:
Hawaii
“20-01-2009”
“04-08-1961”
…
transform data
……
Hard-coded: disadvantages
Coupled: data tranformations only usable in
that implementation
Case-specific: only for one use case
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
Disadvantages of current approaches
Restricted
Uncombinable
Coupled
Case-specific
What do we want?
Unrestricted data transformations
Combinable schema and data transformations
Uncoupled with the implementation
Case-independent solution
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
Aligned declarative schema and
declarative data transformations
Aligned
combine data and schema transformations
Declarative data transformations
no restriction
re-use outside generation framework
Aligned declaratives
not use case, nor implementation specific
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
declaratives | tools
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
declaratives | tools
Declaratives
Declarative schema transformations
source agnostic, schema agnostic
RDF Mapping Language (RML) | http://RML.io
Declarative data transformations
implementation agnostic
Function Ontology (FnO) | http://FnO.io
Aligned
FunctionMap / functionValue
Connection between RML and FnO
RML mapping
source
subject
dbp:birthDate
birth_date
WikiText
dbr:{wiki_label}
predicate
reference
Person_
Mapping
birthDate_
Mapping
dbr:
Barack_
Obama
dbp:birthDate
{{birth date and
age|1961|8|4}}
FnO mapping
executes
inputString
DBpedia_
date_parser
birth_date
DBP_Parsing_
Function
“04-08-1961”
Separate RML and FnO
source
subject
dbp:birthDate
birth_date
WikiText
dbr:{wiki_label}
predicate
reference
Person_
Mapping
birthDate_
Mapping
executes
inputString
DBpedia_
date_parser
birth_date
DBP_Parsing_
Function
Aligned RML and FnO
source
subject
dbp:birthDate
executes
inputString
WikiText
dbr:{wiki_label}
DBpedia_
date_parser
birth_date
predicate
DBP_Parsing_
Function
Function
Map
Person_
Mapping
birthDate_
Mapping
Aligned RML and FnO
source
subject
dbp:birthDate
executes
inputString
WikiText
dbr:{wiki_label}
DBpedia_
date_parser
birth_date
predicate
DBP_Parsing_
Function
Function
Map
Person_
Mapping
birthDate_
Mapping
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
declaratives | tools
Practical - Implementation
RMLProcessor
include WikiText extractor
support FunctionMap / functionValue
connect to FunctionProcessor
FunctionProcessor
dynamically load and call function
External DBpedia Parsing functions
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
…
RML_FnO-doc
Function Processor
…
select extract transform schema +
transform data
Outline
The current situation
existing approaches
disadvantages
What we provide
our approach
implementation
Our approach generates
the same DBpedia data, and:
You don’t depend on the implementation
You don’t depend on the use case
DBpedia parsing functions can be reused
elsewhere
Data transformations can use
existing or new external libraries
See it in action!
Booth 49
https://fnoio.github.io/dbpedia-demo/
https://github.com/RMLio/RML-Mapper/tree/extension-fno
https://github.com/FnOio/function-processor-java
https://github.com/FnOio/dbpedia-parsing-functions-scala

More Related Content

PPTX
2011 05-02 linked data intro
PPTX
2011 05-01 linked data
PDF
ACOMP_2014_submission_70
ODP
DBpedia: A Public Data Infrastructure for the Web of Data
PPTX
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
PDF
Linked Data (1st Linked Data Meetup Malmö)
PDF
Connections that work: Linked Open Data demystified
PPTX
Linked data-tooling-xml
2011 05-02 linked data intro
2011 05-01 linked data
ACOMP_2014_submission_70
DBpedia: A Public Data Infrastructure for the Web of Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Linked Data (1st Linked Data Meetup Malmö)
Connections that work: Linked Open Data demystified
Linked data-tooling-xml

What's hot (20)

PDF
Quick Linked Data Introduction
PDF
DBpedia/association Introduction The Hague 12.2.2016
PPTX
Linked Open Data (LOD) part 1
ZIP
Intro to Linked Open Data in Libraries, Archives & Museums
PPTX
Creating knowledge out of interlinked data
PDF
Entity-Centric Data Management
PPTX
What can linked data do for digital libraries
PDF
WWW2013 Tutorial: Linked Data & Education
PDF
Standardizing for Open Data
ODP
Linking Open Data
PDF
The RDFIndex-MTSR 2013
PPTX
Linked Data in Libraries
PPTX
Knowledge Graph Introduction
PDF
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
PPTX
DMDW Lesson 01 - Introduction
PDF
Digital Humanities in a Linked Data World - Semnantic Annotations
PDF
Cs501 data preprocessingdw
PDF
Usp dh 2013
PDF
Introduction of Knowledge Graphs
PPTX
Online Learning and Linked Data: An Introduction
Quick Linked Data Introduction
DBpedia/association Introduction The Hague 12.2.2016
Linked Open Data (LOD) part 1
Intro to Linked Open Data in Libraries, Archives & Museums
Creating knowledge out of interlinked data
Entity-Centric Data Management
What can linked data do for digital libraries
WWW2013 Tutorial: Linked Data & Education
Standardizing for Open Data
Linking Open Data
The RDFIndex-MTSR 2013
Linked Data in Libraries
Knowledge Graph Introduction
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
DMDW Lesson 01 - Introduction
Digital Humanities in a Linked Data World - Semnantic Annotations
Cs501 data preprocessingdw
Usp dh 2013
Introduction of Knowledge Graphs
Online Learning and Linked Data: An Introduction
Ad

More from Ben De Meester (12)

PDF
Public PhD Defense - Ben De Meester
PPTX
EcoDaLo: closing event
PDF
ISWC2018 PhD Consortium: High Quality Schema and Data Transformations for Lin...
PDF
ESWC2017 P&D: The Function Hub - An implementation-independent read/write fun...
PPTX
ESWC2019 KGB Workshop - Mapping language analysis of comparative characteristics
PPTX
SemSci2017 - Detailed Provenance Capture of Data Processing
PPTX
OrdRing2015 - Event-Driven Rule-based Reasoning using EYE
PPTX
LINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
PPTX
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
PDF
ISWC2015 P&D - StoryBlink
PPTX
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
PPTX
Creating discoverable learning content using a user-friendly authoring enviro...
Public PhD Defense - Ben De Meester
EcoDaLo: closing event
ISWC2018 PhD Consortium: High Quality Schema and Data Transformations for Lin...
ESWC2017 P&D: The Function Hub - An implementation-independent read/write fun...
ESWC2019 KGB Workshop - Mapping language analysis of comparative characteristics
SemSci2017 - Detailed Provenance Capture of Data Processing
OrdRing2015 - Event-Driven Rule-based Reasoning using EYE
LINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
ISWC2015 P&D - StoryBlink
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
Creating discoverable learning content using a user-friendly authoring enviro...
Ad

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Business Analytics and business intelligence.pdf
PDF
annual-report-2024-2025 original latest.
PDF
Foundation of Data Science unit number two notes
PDF
Mega Projects Data Mega Projects Data
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Database Infoormation System (DBIS).pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
1_Introduction to advance data techniques.pptx
Reliability_Chapter_ presentation 1221.5784
Galatica Smart Energy Infrastructure Startup Pitch Deck
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Knowledge Engineering Part 1
Introduction to machine learning and Linear Models
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Business Analytics and business intelligence.pdf
annual-report-2024-2025 original latest.
Foundation of Data Science unit number two notes
Mega Projects Data Mega Projects Data
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Database Infoormation System (DBIS).pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation: the case of DBpedia

Editor's Notes

  • #3: I’d like to give a small warning here.. Even though numerous people told us not to…
  • #4: I am going to use barack obama. It’s probably one of the last times we can use our mascot
  • #5: Barack’s data in DBpedia…
  • #7: schema is very specific (so are the schema transformations) data tfs are very specific
  • #8: Is mostly generated using the DBpedia EF (specifically for the infoboxes): relevant pages are selected, the infoboxes are extracted, the values are put in a certain schema (so schema tfs), finally, the values themselves are transformed (so data transformations).
  • #9: However, they are embedded in the EF. That’s a pity, because dbpedia is so widely used etc. The EF has been tested on 1000s of wikipages, but you can’t use it for other use cases, and you can’t use the parsing functions outside the DBpedia EF. as we will see later, no current solutions can cope with more advanced data transformations
  • #10: More clear
  • #11: So, declarative schema transformations exist. They make generating linked data possible without depending on implementation or use case. Great. However, due to high data tfs demands, current generation approaches cannot be used. What about declarative data tfs?! That would be awesome!
  • #14: split?
  • #17: Is mostly generated using the DBpedia EF (specifically for the infoboxes): relevant pages are selected, the infoboxes are extracted, the values are put in a certain schema (so schema tfs), finally, the values themselves are transformed (so data transformations).
  • #18: so, you have all pages
  • #19: you select the ones with relevant infoboxes
  • #20: The infoboxes are extracted (just using following as simplified examples)
  • #21: use custom mapping doc for the schema tfs
  • #22: Then, the data is parsed to get ‘good data’. This is a very important part of DBpedia, as the data values are entered in wikipedia (so manually), the input data can be very diverse, typo’s different ways of writing things. A large deal of effort has been done into creating these parsing functions, and they are really good.
  • #31: not the only solution
  • #34: instead of use raw value directly…
  • #35: use value after being parsed by underlying function
  • #36: use value after being parsed by underlying function
  • #38: support wikitext: was easy (RMLProcessor is made for that, natural thing to do)
  • #47: [ ] 2 zinnen recap