SlideShare a Scribd company logo
The SPARQL Anything project
Enrico Daga and Luigi Asprino
The Web Conference - Developers Track
22/04/2021 - online @enridaga
Background
• Semantic Web developers always concerned with methods to
“lift” legacy content to RDF:
• Targeting specific types/formats: SPARQL Microservices
[Michel, 2019], Tarql, Any23, JSON2RDF, CSV2RDF
• Mapping languages, several types of (e.g. RML,
ShexML): high learning demands. [Dimou, 2014]
[García-González, 2020]
• SPARQL Generate: learning demands, difficult to extend
to other formats. [Lefrançois, 2017]
• Solutions transfer data source complexity to the user (e.g.
know XPath for XML, JsonPath for JSON, …)
• End-user development [Lieberman, 2006]. Many SPARQL
users fall into the category of end-user developer. In a recent
survey, 42% SPARQL users are from non-IT areas,
including social sciences and the humanities, business and
economics, and biomedical, engineering or physical sciences.
SPICE
Social Cohesion, Participation and Inclusion
through Cultural Engagement
Polifonia
Digital Harmoniser of Musical Cultural Heritage
-
Cultural Heritage Knowledge Graphs
-
Sources in different formats
x
Multiple / unknown ontologies
=
Duplication of effort!!!
https://spice.kmi.open.ac.uk/
http://spice-h2020.eu
https://polifonia-project.eu/
This project has received funding from the European
Union’s Horizon 2020 research and innovation
programme
Knowledge Graph Construction
Composite process:
• Observe: the data source (e.g. a CSV file)
• Map: develop mappings to a target ontology
• Triplify: run the mappings and evaluate the result
• (many iterations)
KG construction is a twofold job:
• perform a syntax/structure conversion (e.g. from CSV to RDF)
• project semantics onto the data (applying a domain ontology)
Concept
… twofold job:
• perform a syntax/structure conversion -> Re-engineering
• We want to solve this problem once and for all
• project semantics onto the data (applying a domain ontology) -> Re-modelling
• We leave this to the end user, powered by SPARQL 1.1
• Approach: design a single RDF facade for any data format
• Re-engineering
• Focus on the syntax and the meta-model (data structure)
• Leave data as much as possible as-it-is!
• apply the least possible “ontological commitment”
https://en.wikipedia.org/wiki/Facade_pattern
An RDF Facade?
Problem Space
• CSV
• JSON
• HTML
• XML
• Binary (JPEG, PNG, …)
• Text
Solution Space
• https://www.w3.org/TR/rdf11-concepts/
• https://www.w3.org/TR/rdf-schema/
rdf:type, rdf:Property, rdfs:label,
rdfs:Resource, rdfs:Class, rdf:Bag,
rdfs:Container, rdf:List, RDF Dataset,
Graph, …
Facade-X: (to be filled by picking and mixing from the solution space)
Ups! We are facing the same old problem … only this time we don’t care about the content
(domain) and we only focus on the format and data structure (meta-model)
CSV
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:Root a rdfs:Class .
id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093
…
https://github.com/tategallery/collection/blob/master/artist_data.csv
[ a fx:root ;
rdf:_1 [ xyz:dates "born 1930" ;
xyz:gender "Female" ;
xyz:id "10093" ;
xyz:name "Abakanowicz, Magdalena" ;
xyz:placeOfBirth "Polska" ;
xyz:placeOfDeath "" ;
xyz:url "http://www.tate.org.uk/art/artists/magdalena-
abakanowicz-10093" ;
xyz:yearOfBirth "1930" ;
xyz:yearOfDeath ""
] ;
csv.headers=true|false
[ a fx:root ;
rdf:_1 [ rdf:_1 "id" ;
rdf:_2 "name" ;
rdf:_3 "gender" ;
rdf:_4 "dates" ;
rdf:_5 "yearOfBirth" ;
rdf:_6 "yearOfDeath" ;
rdf:_7 "placeOfBirth" ;
rdf:_8 “placeOfDeath" ;
rdf:_9 "url"
] ;
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
@enridaga
JSON
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:Root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json
[ a fx:root ;
xyz:acno "T02319" ;
xyz:acquisitionYear "1978"^^<http://www.w3.org/2001/XMLSchema#int> ;
xyz:all_artists "Kazimir Malevich" ;
xyz:catalogueGroup [] ;
xyz:classification "painting" ;
xyz:contributorCount "1"^^<http://www.w3.org/2001/XMLSchema#int> ;
…
{
"acno": "T02319",
"acquisitionYear": 1978,
"all_artists": "Kazimir Malevich",
"catalogueGroup": {},
"classification": "painting",
"contributorCount": 1,
"contributors": [
{
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
DOM (HTML, XML, …)
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:Root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
rdf:type rdf:type rdf:Property
https://imma.ie/artists/
[ a fx:root , xhtml:div ;
xhtml:id “az-group” ;
rdf:_1 [ a xhtml:div ;
rdf:_1 [ a xhtml:h4 ;
rdf:_1 "A" ;
<https://html.spec.whatwg.org/#innerHTML>
"A" ;
<https://html.spec.whatwg.org/#innerText>
"A"
] ;
…
html.selector=#az-group
@prefix xhtml: <http://www.w3.org/1999/xhtml#> .
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
Binary and Text
Facade: http://sparql.xyz/facade-x/ns/
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fx: <http://sparql.xyz/facade-x/ns/>.
@prefix xyz: <http://sparql.xyz/facade-x/data/>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
rdf:Property a rdfs:Class .
rdfs:ContainerMembershipProperty
rdfs:subClassOf rdf:Property .
fx:Root a rdfs:Class .
xsd:int a rdfs:Datatype.
xsd:string a rdfs:Datatype.
xsd:boolean a rdfs:Datatype.
xsd:decimal a rdfs:Datatype.
xsd:float a rdfs:Datatype.
xsd:double a rdfs:Datatype.
xsd:base64Binary a rdfs:Datatype.
rdf:type df:type rdf:Property
[ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> “/9j/
4AAQSkZJRgABAQEASABIAAD/
4QmsRXhpZgAASUkqAAgAAAALAA8BAgAGAAAAkgAAABABAgAOAAAAmAAAABIBAw
ABAAAAAQAAABoBBQABAAAApgAAABsBBQABAAAArgAAACgBAwABAAAAAgAAADEB
AgALAAAAtgAAADIBAgAUAAAAwgAAABMCAwABAAAAAgAAAGmHBAABAAAA1gAAAC
WIBAABAAAA0gMAAOQDAABDYW5vbgBDYW5vbiBFT1MgNDBEAEgAAAABAAAASAAA
AAEAAABHSU1QIDIuNC41AAAyMDA4OjA3OjMxIDEwOjM4OjExAB4Am…”^^<http
://www.w3.org/2001/XMLSchema#base64Binary> ] .
bin.encoding # BASE64
txt.regex # tokenise into a sequence
CSV
JSON
HTML
XML
Binary (JPEG, PNG, …)
Text
https://imma.ie/collection/freeing-the-voice/
Hello World! [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Hello World!" ] .
https://sparql-anything.cc/
https://
github.com/
SPARQL-
Anything/
showcase-tate
Assumption: SPARQL
1.1 CONSTRUCT
queries will be enough
to design mappings (the
re-modelling phase)
https://github.com/
SPARQL-Anything/
showcase-imma
https://imma.ie/collection/freeing-the-memory/
Preliminary feedback
• From 27 users, diverse SPARQL expertise
• Essential or very important
• the system should minimise the languages or syntaxes needed
• mappings should be easy to read and interpret
• the system must be easy to learn for a Semantic Web practitioner
• the system is able to support new types of data sources without changes to the mapping language
• How easy is this code to understand
(comparing equivalent mappings)?
• (a) RML
• (b) SPARQL Generate
• (c) SPARQL Anything
Benefits
• Transform / Query resources having heterogeneous formats
• Low learning demands (plain SPARQL 1.1)
• Minimise complexity of the mappings
• A single+consistent abstraction for any data format
• Enable data exploration in the absence of a domain ontology
• Integrate with a typical Semantic Web engineering workflow
• Flexible and adaptable (Facade-X can be extended, if needed)
• Easy to extend:
• new transformers just need to return the facade
• no major changes to the user experience
Challenges
• No commitment on the internal machinery! (It is a gift and a curse …)
• Current version v0.1.1 (we started Nov 2020):
• implemented on top of Apache Jena ARQ
• limited to files
• loads the triples in-memory and then performs the query
• A triple filtering strategy reduces in-memory dataset
• Very large files require very large memory
• Next: to develop strategies to cope with large resources (e.g. slicing)
• Next: to develop query-rewriting strategies, eventually rewriting mappings into efficient,
iterator-based transformers (mapping translation [Corcho 2020])
• Next: Relational Database, No-SQL (e.g. mongoDB)
• Reuse existing approaches (e.g. OBDA) but hide complexity to the user
Get in touch!
SPARQL Anything is under active development
https://github.com/SPARQL-Anything/sparql.anything
enrico.daga@open.ac.uk
@enridaga
www.enridaga.net
References
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything (submitted). In: SEMANTiCS 2021:
17th International Conference on Semantic Systems (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer
(2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data
standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings
of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data
mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020)
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state
of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web
Conference. pp. 35– 50. Springer (2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer
(2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.

More Related Content

PDF
Knowledge graph construction with a façade - The SPARQL Anything Project
PDF
Trying SPARQL Anything with MEI
PDF
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
PDF
Graph databases & data integration v2
PDF
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
PDF
20110728 datalift-rpi-troy
PDF
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
PPTX
Wi2015 - Clustering of Linked Open Data - the LODeX tool
Knowledge graph construction with a façade - The SPARQL Anything Project
Trying SPARQL Anything with MEI
PhD thesis defense: Large-scale multilingual knowledge extraction, publishin...
Graph databases & data integration v2
VALA Tech Camp 2017: Intro to Wikidata & SPARQL
20110728 datalift-rpi-troy
Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011
Wi2015 - Clustering of Linked Open Data - the LODeX tool

What's hot (12)

PDF
Linked Open Data Visualization
PPTX
Triple Stores
PPTX
RDF data model
PPTX
RDF, linked data and semantic web
PDF
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
ODP
Semantic Web introduction
ODP
Data Integration And Visualization
PDF
Two graph data models : RDF and Property Graphs
PPTX
Building Linked Data Applications
PPTX
SHACL: Shaping the Big Ball of Data Mud
PPT
SPARQL in the Semantic Web
PDF
Linked Open Data: A simple how-to
Linked Open Data Visualization
Triple Stores
RDF data model
RDF, linked data and semantic web
Mapping, Interlinking and Exposing MusicBrainz as Linked Data
Semantic Web introduction
Data Integration And Visualization
Two graph data models : RDF and Property Graphs
Building Linked Data Applications
SHACL: Shaping the Big Ball of Data Mud
SPARQL in the Semantic Web
Linked Open Data: A simple how-to
Ad

Similar to The SPARQL Anything project (20)

PDF
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
PDF
Data integration with a façade. The case of knowledge graph construction.
PDF
Overview of the SPARQL-Generate language and latest developments
PPTX
Semantic web meetup – sparql tutorial
ODP
State of the Semantic Web
PDF
A Hands On Overview Of The Semantic Web
PPTX
Madrid SPARQL handson
PPTX
A year on the Semantic Web @ W3C
PPT
Facet: Building Web Pages with SPARQL
PPT
SPARQL and SQL: technical aspects and synergy
PPTX
What;s Coming In SPARQL2?
ODP
SPARQL 1.1 Update (2013-03-05)
ZIP
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
PPTX
Practical Cross-Dataset Queries with SPARQL (Introduction)
PDF
Culture Geeks Feb talk: Adventures in Linked Data Land
PPT
PDF
Linking the world with Python and Semantics
PPTX
A Little SPARQL in your Analytics
PDF
RDF Seminar Presentation
PDF
Adventures in Linked Data Land (presentation by Richard Light)
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Data integration with a façade. The case of knowledge graph construction.
Overview of the SPARQL-Generate language and latest developments
Semantic web meetup – sparql tutorial
State of the Semantic Web
A Hands On Overview Of The Semantic Web
Madrid SPARQL handson
A year on the Semantic Web @ W3C
Facet: Building Web Pages with SPARQL
SPARQL and SQL: technical aspects and synergy
What;s Coming In SPARQL2?
SPARQL 1.1 Update (2013-03-05)
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
Practical Cross-Dataset Queries with SPARQL (Introduction)
Culture Geeks Feb talk: Adventures in Linked Data Land
Linking the world with Python and Semantics
A Little SPARQL in your Analytics
RDF Seminar Presentation
Adventures in Linked Data Land (presentation by Richard Light)
Ad

More from Enrico Daga (15)

PDF
Citizen Experiences in Cultural Heritage Archives: a Data Journey
PDF
Capturing the semantics of documentary evidence for humanities research
PDF
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
PDF
Linked data for knowledge curation in humanities research
PDF
Capturing Themed Evidence, a Hybrid Approach
PDF
Challenging knowledge extraction to support
the curation of documentary evide...
PDF
Ld4 dh tutorial
PDF
OU RSE Tutorial Big Data Cluster
PDF
CityLABS Workshop: Working with large tables
PDF
Propagating Data Policies - A User Study
PDF
Linked Data at the OU - the story so far
PDF
Propagation of Policies in Rich Data Flows
PDF
A bottom up approach for licences classification and selection
PDF
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
PDF
Early Analysis and Debuggin of Linked Open Data Cubes
Citizen Experiences in Cultural Heritage Archives: a Data Journey
Capturing the semantics of documentary evidence for humanities research
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Linked data for knowledge curation in humanities research
Capturing Themed Evidence, a Hybrid Approach
Challenging knowledge extraction to support
the curation of documentary evide...
Ld4 dh tutorial
OU RSE Tutorial Big Data Cluster
CityLABS Workshop: Working with large tables
Propagating Data Policies - A User Study
Linked Data at the OU - the story so far
Propagation of Policies in Rich Data Flows
A bottom up approach for licences classification and selection
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
Early Analysis and Debuggin of Linked Open Data Cubes

Recently uploaded (20)

PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
DOCX
Factor Analysis Word Document Presentation
PDF
Introduction to the R Programming Language
PDF
How to run a consulting project- client discovery
PPT
Predictive modeling basics in data cleaning process
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
modul_python (1).pptx for professional and student
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Introduction to Data Science and Data Analysis
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
Factor Analysis Word Document Presentation
Introduction to the R Programming Language
How to run a consulting project- client discovery
Predictive modeling basics in data cleaning process
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
SAP 2 completion done . PRESENTATION.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
ISS -ESG Data flows What is ESG and HowHow
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
modul_python (1).pptx for professional and student
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Data Science and Data Analysis
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
IBA_Chapter_11_Slides_Final_Accessible.pptx

The SPARQL Anything project

  • 1. The SPARQL Anything project Enrico Daga and Luigi Asprino The Web Conference - Developers Track 22/04/2021 - online @enridaga
  • 2. Background • Semantic Web developers always concerned with methods to “lift” legacy content to RDF: • Targeting specific types/formats: SPARQL Microservices [Michel, 2019], Tarql, Any23, JSON2RDF, CSV2RDF • Mapping languages, several types of (e.g. RML, ShexML): high learning demands. [Dimou, 2014] [García-González, 2020] • SPARQL Generate: learning demands, difficult to extend to other formats. [Lefrançois, 2017] • Solutions transfer data source complexity to the user (e.g. know XPath for XML, JsonPath for JSON, …) • End-user development [Lieberman, 2006]. Many SPARQL users fall into the category of end-user developer. In a recent survey, 42% SPARQL users are from non-IT areas, including social sciences and the humanities, business and economics, and biomedical, engineering or physical sciences.
  • 3. SPICE Social Cohesion, Participation and Inclusion through Cultural Engagement Polifonia Digital Harmoniser of Musical Cultural Heritage - Cultural Heritage Knowledge Graphs - Sources in different formats x Multiple / unknown ontologies = Duplication of effort!!! https://spice.kmi.open.ac.uk/ http://spice-h2020.eu https://polifonia-project.eu/ This project has received funding from the European Union’s Horizon 2020 research and innovation programme
  • 4. Knowledge Graph Construction Composite process: • Observe: the data source (e.g. a CSV file) • Map: develop mappings to a target ontology • Triplify: run the mappings and evaluate the result • (many iterations) KG construction is a twofold job: • perform a syntax/structure conversion (e.g. from CSV to RDF) • project semantics onto the data (applying a domain ontology)
  • 5. Concept … twofold job: • perform a syntax/structure conversion -> Re-engineering • We want to solve this problem once and for all • project semantics onto the data (applying a domain ontology) -> Re-modelling • We leave this to the end user, powered by SPARQL 1.1 • Approach: design a single RDF facade for any data format • Re-engineering • Focus on the syntax and the meta-model (data structure) • Leave data as much as possible as-it-is! • apply the least possible “ontological commitment” https://en.wikipedia.org/wiki/Facade_pattern
  • 6. An RDF Facade? Problem Space • CSV • JSON • HTML • XML • Binary (JPEG, PNG, …) • Text Solution Space • https://www.w3.org/TR/rdf11-concepts/ • https://www.w3.org/TR/rdf-schema/ rdf:type, rdf:Property, rdfs:label, rdfs:Resource, rdfs:Class, rdf:Bag, rdfs:Container, rdf:List, RDF Dataset, Graph, … Facade-X: (to be filled by picking and mixing from the solution space) Ups! We are facing the same old problem … only this time we don’t care about the content (domain) and we only focus on the format and data structure (meta-model)
  • 7. CSV Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:Root a rdfs:Class . id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url 10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093 … https://github.com/tategallery/collection/blob/master/artist_data.csv [ a fx:root ; rdf:_1 [ xyz:dates "born 1930" ; xyz:gender "Female" ; xyz:id "10093" ; xyz:name "Abakanowicz, Magdalena" ; xyz:placeOfBirth "Polska" ; xyz:placeOfDeath "" ; xyz:url "http://www.tate.org.uk/art/artists/magdalena- abakanowicz-10093" ; xyz:yearOfBirth "1930" ; xyz:yearOfDeath "" ] ; csv.headers=true|false [ a fx:root ; rdf:_1 [ rdf:_1 "id" ; rdf:_2 "name" ; rdf:_3 "gender" ; rdf:_4 "dates" ; rdf:_5 "yearOfBirth" ; rdf:_6 "yearOfDeath" ; rdf:_7 "placeOfBirth" ; rdf:_8 “placeOfDeath" ; rdf:_9 "url" ] ; CSV JSON HTML XML Binary (JPEG, PNG, …) Text @enridaga
  • 8. JSON Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:Root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json [ a fx:root ; xyz:acno "T02319" ; xyz:acquisitionYear "1978"^^<http://www.w3.org/2001/XMLSchema#int> ; xyz:all_artists "Kazimir Malevich" ; xyz:catalogueGroup [] ; xyz:classification "painting" ; xyz:contributorCount "1"^^<http://www.w3.org/2001/XMLSchema#int> ; … { "acno": "T02319", "acquisitionYear": 1978, "all_artists": "Kazimir Malevich", "catalogueGroup": {}, "classification": "painting", "contributorCount": 1, "contributors": [ { CSV JSON HTML XML Binary (JPEG, PNG, …) Text
  • 9. DOM (HTML, XML, …) Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:Root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. rdf:type rdf:type rdf:Property https://imma.ie/artists/ [ a fx:root , xhtml:div ; xhtml:id “az-group” ; rdf:_1 [ a xhtml:div ; rdf:_1 [ a xhtml:h4 ; rdf:_1 "A" ; <https://html.spec.whatwg.org/#innerHTML> "A" ; <https://html.spec.whatwg.org/#innerText> "A" ] ; … html.selector=#az-group @prefix xhtml: <http://www.w3.org/1999/xhtml#> . CSV JSON HTML XML Binary (JPEG, PNG, …) Text
  • 10. Binary and Text Facade: http://sparql.xyz/facade-x/ns/ @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix fx: <http://sparql.xyz/facade-x/ns/>. @prefix xyz: <http://sparql.xyz/facade-x/data/>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. rdf:Property a rdfs:Class . rdfs:ContainerMembershipProperty rdfs:subClassOf rdf:Property . fx:Root a rdfs:Class . xsd:int a rdfs:Datatype. xsd:string a rdfs:Datatype. xsd:boolean a rdfs:Datatype. xsd:decimal a rdfs:Datatype. xsd:float a rdfs:Datatype. xsd:double a rdfs:Datatype. xsd:base64Binary a rdfs:Datatype. rdf:type df:type rdf:Property [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> “/9j/ 4AAQSkZJRgABAQEASABIAAD/ 4QmsRXhpZgAASUkqAAgAAAALAA8BAgAGAAAAkgAAABABAgAOAAAAmAAAABIBAw ABAAAAAQAAABoBBQABAAAApgAAABsBBQABAAAArgAAACgBAwABAAAAAgAAADEB AgALAAAAtgAAADIBAgAUAAAAwgAAABMCAwABAAAAAgAAAGmHBAABAAAA1gAAAC WIBAABAAAA0gMAAOQDAABDYW5vbgBDYW5vbiBFT1MgNDBEAEgAAAABAAAASAAA AAEAAABHSU1QIDIuNC41AAAyMDA4OjA3OjMxIDEwOjM4OjExAB4Am…”^^<http ://www.w3.org/2001/XMLSchema#base64Binary> ] . bin.encoding # BASE64 txt.regex # tokenise into a sequence CSV JSON HTML XML Binary (JPEG, PNG, …) Text https://imma.ie/collection/freeing-the-voice/ Hello World! [ <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> "Hello World!" ] .
  • 12. https:// github.com/ SPARQL- Anything/ showcase-tate Assumption: SPARQL 1.1 CONSTRUCT queries will be enough to design mappings (the re-modelling phase)
  • 14. Preliminary feedback • From 27 users, diverse SPARQL expertise • Essential or very important • the system should minimise the languages or syntaxes needed • mappings should be easy to read and interpret • the system must be easy to learn for a Semantic Web practitioner • the system is able to support new types of data sources without changes to the mapping language • How easy is this code to understand (comparing equivalent mappings)? • (a) RML • (b) SPARQL Generate • (c) SPARQL Anything
  • 15. Benefits • Transform / Query resources having heterogeneous formats • Low learning demands (plain SPARQL 1.1) • Minimise complexity of the mappings • A single+consistent abstraction for any data format • Enable data exploration in the absence of a domain ontology • Integrate with a typical Semantic Web engineering workflow • Flexible and adaptable (Facade-X can be extended, if needed) • Easy to extend: • new transformers just need to return the facade • no major changes to the user experience
  • 16. Challenges • No commitment on the internal machinery! (It is a gift and a curse …) • Current version v0.1.1 (we started Nov 2020): • implemented on top of Apache Jena ARQ • limited to files • loads the triples in-memory and then performs the query • A triple filtering strategy reduces in-memory dataset • Very large files require very large memory • Next: to develop strategies to cope with large resources (e.g. slicing) • Next: to develop query-rewriting strategies, eventually rewriting mappings into efficient, iterator-based transformers (mapping translation [Corcho 2020]) • Next: Relational Database, No-SQL (e.g. mongoDB) • Reuse existing approaches (e.g. OBDA) but hide complexity to the user
  • 17. Get in touch! SPARQL Anything is under active development https://github.com/SPARQL-Anything/sparql.anything enrico.daga@open.ac.uk @enridaga www.enridaga.net
  • 18. References • Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything (submitted). In: SEMANTiCS 2021: 17th International Conference on Semantic Systems (2021) • Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021) • Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018) • Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020) • Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019) • Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014) • García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020) • Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011) • Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer (2017) • Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006) • Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.