SlideShare a Scribd company logo
DBGroup@UNIMO
Fabio Benedetti, Sonia Bergamaschi, Laura Po
Department of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
The 2015 IEEE/WIC/ACM International Conference on Web Intelligence
DBGroup@UNIMO
3Laura Po “Exposing the underlying schema of LOD sources” 3
★ publish data on the Web under an open license
★ ★ make data available as structured data
★ ★ ★ make data available in a non-proprietary open format
★ ★ ★ ★ ★ link your data to other data to provide context
★ ★ ★ ★ use URIs to denote things
★ ★ ★ ★ ★ L document your data
in a top-down fashion
In 2006, Tim Berners-Lee coined the term "Linked Data”
DBGroup@UNIMO
4Laura Po “Exposing the underlying schema of LOD sources” 4
The LOD Cloud
• more then one
thousand of interlinked
datasets
• several billions of RDF
triples
Each LOD source
• widely varying size,
from thousands to
billions of triples
DBGroup@UNIMO
5Laura Po “Exposing the underlying schema of LOD sources” 5
A tool for promoting the understanding, navigation
and querying of LOD sources
Requirements
• portable to the LOD Cloud
• provide a synthetic representation of the structure of
the dataset (Schema Summary, Clustered Schema Summary)
• provide visual query building functionalities hiding
the complexity of Semantic Web technologies
DBGroup@UNIMO
6Laura Po “Exposing the underlying schema of LOD sources” 6
DBGroup@UNIMO
9Laura Po “Exposing the underlying schema of LOD sources” 9
Schema Summary
Clustered Schema Summary
DBGroup@UNIMO
10Laura Po “Exposing the underlying schema of LOD sources” 10
Schema
Summary
Clustered
Schema
Summary
DBGroup@UNIMO
11Laura Po “Exposing the underlying schema of LOD sources” 11
• A tool for exploring and querying LOD sources
+ navigation of large LOD sources
Try LODeX at: http://dbgroup.unimo.it/lodex2
http://www.dbgroup.unimo.it/lodex2/testCluster
Future works
• New filtering and clustering techniques
• An interactive exploration than start from the highest
level and can be detailed till the lowest level
• Query functionalities on the Clustered Schema Summary
(mapping functionalities to convert a visual query on the
CSS to a SPARQL query on the LOD endpoint)
DBGroup@UNIMO
12Laura Po “Exposing the underlying schema of LOD sources” 12
Thanks for your attention!
Come to see the poster!
DBGroup@UNIMO
13Laura Po “Exposing the underlying schema of LOD sources” 13
• F. Benedetti, S. Bergamaschi, L. Po, Exposing the underlying
schema of LOD sources. WI 2015
• F. Benedetti, S. Bergamaschi, L. Po, LODeX: A tool for Visual
Querying Linked Open Data. ISWC 2015 (Posters &
Demonstrations Track)
• F. Benedetti, S. Bergamaschi, L. Po, Visual Querying LOD sources
with LODeX. K-CAP 2015
• F. Benedetti, S. Bergamaschi, and L. Po, A visual summary for
linked open data sources. ISWC 2014 (Posters & Demonstrations
Track)
• F. Benedetti, S. Bergamaschi, and L. Po. Online index extraction
from linked open data sources. Linked Data for Information
Extraction (LD4IE) Workshop held at ISWC 2014
DBGroup@UNIMO
14Laura Po “Exposing the underlying schema of LOD sources” 14
DBGroup@UNIMO
15Laura Po “Exposing the underlying schema of LOD sources” 15
• Each RDF graph is composed by a set of vertices V and a set of labelled
edges E. The vertices can be divided in 3 disjoint sets: the URIs U, the blank
nodes B and literals L.
• Two vertices connected by an edge represent a statement. Each
statement is stored into a <subject,predicate,object> triple, where
subject  (U  B) , object  V and predicate  E.
• We can define the whole RDF graph as a set of triples RG.
RG  (U  B) x E x V
• The rdf:type property is used to state that a certain resource is an instance
of a class. We define the set of classes as Cs.
Cs = {c |<i,rdf:type,c>  RG ^ i  (U  B) }
• We call partial cluster of classes (PC) a set of classes that concur in the
multiple instantiation of the same resource:
PC(i) = {c|<i,rdf:type,c>  RG ^ i  (U  B) }
• and each PC(i)  C
DBGroup@UNIMO
16Laura Po “Exposing the underlying schema of LOD sources” 16
• The partial cluster of classes (PC) are sets of classes that concur in the
multiple instantiation of the same resource:
PC(i) = {c|<i,rdf:type,c>  RG ^ i  (U  B) }
• By examining all the instances in a RG graph, we find different PC.
• The collection of all the PC that occur in a RG graph is called family of
PC, C :
C = {PC(i): i  (U  B)}
• C contains a particular family of sets able to generate all the other sets.
We call this family, family of super sets (S2), and we define it as follow:
S = {ST  C: PC  C ^ PC  ST}
• For each set st  S , a class ca  st must be elected to represent the
entire set of classes. This class is called candidate agent of the superset.
For each superset, we choose as candidate agent the class with the
highest number of instances.
DBGroup@UNIMO
17Laura Po “Exposing the underlying schema of LOD sources” 17
The Schema Summary is a pseudograph composed by:
• C - Classes (nodes)
• P - Properties (edges)
And additional elements and function:
• A - Attributes associated to each class
– Each attribute represent the existence of a Datatype property
from the instances of the class
• 𝒍 - labels
• l – labeling function
• count - count function
The Schema Summary is inferred by the distribution of
the instances of a dataset
DBGroup@UNIMO
18Laura Po “Exposing the underlying schema of LOD sources” 18
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DBGroup@UNIMO
19Laura Po “Exposing the underlying schema of LOD sources” 19
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DBGroup@UNIMO
20Laura Po “Exposing the underlying schema of LOD sources” 20
These indexes belong to extensional group of the Statistical Indexes [2]:
• SC (Subject Class) contains the pairs (p,c) where p is an object property
and c is its domain class.
• SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype
property and c is its domain class.
• OC (Object Class) contains the pairs (p,c) where p is an object property
and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional
Classes
Extensional
Knowledge
“Village electrification
in the Pacific”
“+41331231”
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DBGroup@UNIMO
21Laura Po “Exposing the underlying schema of LOD sources” 21
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
DBGroup@UNIMO
22Laura Po “Exposing the underlying schema of LOD sources” 22
foaf:Organizzation
2
ex:Sector
1
ex:sector 2foaf:Person
1
ex:ceo 1
dc:title 1foaf:firstName 1
foaf:lastName 1
ex:activity 1
dbpedia:fax 1
We use an algorithm for combining these indexes and produce a Schema
Summary
Name Values
SC
(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1),
(foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1),
(ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1),
(foaf:Organization,dbpedia:fax,1)
OC
(ex:Sector,ex:sector,1)
(ex:Person,ex:ceo,1)
DBGroup@UNIMO
23Laura Po “Exposing the underlying schema of LOD sources” 23
Two main modules
• Extraction & Summarization
– Index Extraction (IE)
– Post Processing (PP)
LOD Cloud
SPARQL
Queries
LODeX
Post-
processing
Statistical
Indexes
LODeX
Indexes
Extraction
Endpoint
URLs
Schema
Summary
NoSQL
SPARQL
Queries
Schema
Summary
Query
Orchestrator
Schema
Summary
Visualizzation
Basic
QueryResults
• Visualization & Querying
– Schema Summary Visualization
– Query Orchestrator
DBGroup@UNIMO
24Laura Po “Exposing the underlying schema of LOD sources” 24
Schema Summary Visualization
Front end of the Web Application composed by three panel:
• List of datasets indexed in LODeX
• Schema Summary and query building panel
• Refinement panel
Query Orchestrator
• It manages the interaction between the User and the GUI
• It contains a SPARQL compiler able to compile the visual
query in a SPARQL one
DBGroup@UNIMO
25Laura Po “Exposing the underlying schema of LOD sources” 25
DBGroup@UNIMO
26Laura Po “Exposing the underlying schema of LOD sources” 26

More Related Content

PDF
Linked Open Data Visualization
PPTX
Querying Linked Data
PPTX
Building Linked Data Applications
PPTX
Interaction with Linked Data
PPTX
Providing Linked Data
PDF
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
PPTX
Triple Stores
PPTX
Big Linked Data - Creating Training Curricula
Linked Open Data Visualization
Querying Linked Data
Building Linked Data Applications
Interaction with Linked Data
Providing Linked Data
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Triple Stores
Big Linked Data - Creating Training Curricula

What's hot (20)

PPTX
Querying Linked Data on Android
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
PDF
Linked (Open) Data
PPTX
Efficient RDF Interchange (ERI) Format for RDF Data Streams
PDF
Phd presentation
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PPTX
Development of Semantic Web based Disaster Management System
PDF
Scaling the (evolving) web data –at low cost-
PDF
Web Data Management with RDF
PDF
LDQL: A Query Language for the Web of Linked Data
PPT
euclid_linkedup WWW tutorial (Besnik Fetahu)
PDF
From the Semantic Web to the Web of Data: ten years of linking up
PPT
Introduction | Categories for Description of Works of Art | CDWA-LITE
PPTX
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
PPTX
Hack U Barcelona 2011
PDF
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
PDF
Introduction of Knowledge Graphs
PDF
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
PDF
The web of interlinked data and knowledge stripped
PPTX
The Dublin Core 1:1 Principle in the Age of Linked Data
Querying Linked Data on Android
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 3 (...
Linked (Open) Data
Efficient RDF Interchange (ERI) Format for RDF Data Streams
Phd presentation
RDF-Gen: Generating RDF from streaming and archival data
Development of Semantic Web based Disaster Management System
Scaling the (evolving) web data –at low cost-
Web Data Management with RDF
LDQL: A Query Language for the Web of Linked Data
euclid_linkedup WWW tutorial (Besnik Fetahu)
From the Semantic Web to the Web of Data: ten years of linking up
Introduction | Categories for Description of Works of Art | CDWA-LITE
Alphabet soup: CDM, VRA, CCO, METS, MODS, RDF - Why Metadata Matters
Hack U Barcelona 2011
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Introduction of Knowledge Graphs
Tutorial "An Introduction to SPARQL and Queries over Linked Data" Chapter 1 (...
The web of interlinked data and knowledge stripped
The Dublin Core 1:1 Principle in the Age of Linked Data
Ad

Similar to Wi2015 - Clustering of Linked Open Data - the LODeX tool (20)

PPTX
Visual Querying LOD sources with LODeX
PDF
Link Discovery Tutorial Introduction
PPTX
LODeX: Schema Summarization and automatic SPARQL query generation for Linked ...
ODP
What the Adoption of schema.org Tells about Linked Open Data
PPTX
Mining and Managing Large-scale Linked Open Data
PPTX
Mining and Managing Large-scale Linked Open Data
PDF
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
PDF
Semantic Web talk TEMPLATE
PPTX
Online Index Extraction from Linked Open Data Sources
PDF
Linked Open Data
PPTX
Publishing "5 star" data: the case for RDF
PPT
ontology.ppt
PDF
A Hands On Overview Of The Semantic Web
PDF
Rdf data-model-and-storage
PDF
Indexing data on the web a comparison of schema level indices for data search
PDF
Visualize open data with Plone - eea.daviz PLOG 2013
PDF
Semantic Web Technology
PDF
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
PPTX
Knowledge Graph Introduction
PPTX
Publishing and Using Linked Open Data - Day 2
Visual Querying LOD sources with LODeX
Link Discovery Tutorial Introduction
LODeX: Schema Summarization and automatic SPARQL query generation for Linked ...
What the Adoption of schema.org Tells about Linked Open Data
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
Linked Open Graph: browsing multiple SPARQL entry points to build your own LO...
Semantic Web talk TEMPLATE
Online Index Extraction from Linked Open Data Sources
Linked Open Data
Publishing "5 star" data: the case for RDF
ontology.ppt
A Hands On Overview Of The Semantic Web
Rdf data-model-and-storage
Indexing data on the web a comparison of schema level indices for data search
Visualize open data with Plone - eea.daviz PLOG 2013
Semantic Web Technology
“Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an a...
Knowledge Graph Introduction
Publishing and Using Linked Open Data - Day 2
Ad

More from Laura Po (13)

PPTX
Towards sustainable mobility for citizens and the environment @ AI, HPC and B...
PPTX
Big data analytics for smart and sustainable city galway
PPTX
TRAFAIR - Premio PA sostenibile 2019 - slide di presentazione
PDF
TRAFAIR - Premio PA sostenibile 2019
PPTX
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
PDF
Session 3 "Challenges and Opportunities with Big Linked Data Visualization" t...
PDF
Building an urban theft map by analyzing newspaper - SMAP 2018
PDF
Exploration, visualization and querying of linked open data sources
PDF
Introduction to linked data
PDF
Comparing topic models for a movie recommendation system webist2014
PPTX
An iPad Order Management System for Fashion Trade
PPTX
A Non-Intrusive Movie Recommendation System
PPTX
A meta language for mdx queries in e log business
Towards sustainable mobility for citizens and the environment @ AI, HPC and B...
Big data analytics for smart and sustainable city galway
TRAFAIR - Premio PA sostenibile 2019 - slide di presentazione
TRAFAIR - Premio PA sostenibile 2019
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualiza...
Session 3 "Challenges and Opportunities with Big Linked Data Visualization" t...
Building an urban theft map by analyzing newspaper - SMAP 2018
Exploration, visualization and querying of linked open data sources
Introduction to linked data
Comparing topic models for a movie recommendation system webist2014
An iPad Order Management System for Fashion Trade
A Non-Intrusive Movie Recommendation System
A meta language for mdx queries in e log business

Recently uploaded (20)

PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
System and Network Administraation Chapter 3
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
history of c programming in notes for students .pptx
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ai tools demonstartion for schools and inter college
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Essential Infomation Tech presentation.pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Softaken Excel to vCard Converter Software.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
Navsoft: AI-Powered Business Solutions & Custom Software Development
PTS Company Brochure 2025 (1).pdf.......
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Wondershare Filmora 15 Crack With Activation Key [2025
System and Network Administraation Chapter 3
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
history of c programming in notes for students .pptx
2025 Textile ERP Trends: SAP, Odoo & Oracle
CHAPTER 2 - PM Management and IT Context
ai tools demonstartion for schools and inter college
wealthsignaloriginal-com-DS-text-... (1).pdf
Design an Analysis of Algorithms II-SECS-1021-03
How to Migrate SBCGlobal Email to Yahoo Easily
Essential Infomation Tech presentation.pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Reimagine Home Health with the Power of Agentic AI​
Softaken Excel to vCard Converter Software.pdf

Wi2015 - Clustering of Linked Open Data - the LODeX tool

  • 1. DBGroup@UNIMO Fabio Benedetti, Sonia Bergamaschi, Laura Po Department of Engineering “Enzo Ferrari” University of Modena & Reggio Emilia The 2015 IEEE/WIC/ACM International Conference on Web Intelligence
  • 2. DBGroup@UNIMO 3Laura Po “Exposing the underlying schema of LOD sources” 3 ★ publish data on the Web under an open license ★ ★ make data available as structured data ★ ★ ★ make data available in a non-proprietary open format ★ ★ ★ ★ ★ link your data to other data to provide context ★ ★ ★ ★ use URIs to denote things ★ ★ ★ ★ ★ L document your data in a top-down fashion In 2006, Tim Berners-Lee coined the term "Linked Data”
  • 3. DBGroup@UNIMO 4Laura Po “Exposing the underlying schema of LOD sources” 4 The LOD Cloud • more then one thousand of interlinked datasets • several billions of RDF triples Each LOD source • widely varying size, from thousands to billions of triples
  • 4. DBGroup@UNIMO 5Laura Po “Exposing the underlying schema of LOD sources” 5 A tool for promoting the understanding, navigation and querying of LOD sources Requirements • portable to the LOD Cloud • provide a synthetic representation of the structure of the dataset (Schema Summary, Clustered Schema Summary) • provide visual query building functionalities hiding the complexity of Semantic Web technologies
  • 5. DBGroup@UNIMO 6Laura Po “Exposing the underlying schema of LOD sources” 6
  • 6. DBGroup@UNIMO 9Laura Po “Exposing the underlying schema of LOD sources” 9 Schema Summary Clustered Schema Summary
  • 7. DBGroup@UNIMO 10Laura Po “Exposing the underlying schema of LOD sources” 10 Schema Summary Clustered Schema Summary
  • 8. DBGroup@UNIMO 11Laura Po “Exposing the underlying schema of LOD sources” 11 • A tool for exploring and querying LOD sources + navigation of large LOD sources Try LODeX at: http://dbgroup.unimo.it/lodex2 http://www.dbgroup.unimo.it/lodex2/testCluster Future works • New filtering and clustering techniques • An interactive exploration than start from the highest level and can be detailed till the lowest level • Query functionalities on the Clustered Schema Summary (mapping functionalities to convert a visual query on the CSS to a SPARQL query on the LOD endpoint)
  • 9. DBGroup@UNIMO 12Laura Po “Exposing the underlying schema of LOD sources” 12 Thanks for your attention! Come to see the poster!
  • 10. DBGroup@UNIMO 13Laura Po “Exposing the underlying schema of LOD sources” 13 • F. Benedetti, S. Bergamaschi, L. Po, Exposing the underlying schema of LOD sources. WI 2015 • F. Benedetti, S. Bergamaschi, L. Po, LODeX: A tool for Visual Querying Linked Open Data. ISWC 2015 (Posters & Demonstrations Track) • F. Benedetti, S. Bergamaschi, L. Po, Visual Querying LOD sources with LODeX. K-CAP 2015 • F. Benedetti, S. Bergamaschi, and L. Po, A visual summary for linked open data sources. ISWC 2014 (Posters & Demonstrations Track) • F. Benedetti, S. Bergamaschi, and L. Po. Online index extraction from linked open data sources. Linked Data for Information Extraction (LD4IE) Workshop held at ISWC 2014
  • 11. DBGroup@UNIMO 14Laura Po “Exposing the underlying schema of LOD sources” 14
  • 12. DBGroup@UNIMO 15Laura Po “Exposing the underlying schema of LOD sources” 15 • Each RDF graph is composed by a set of vertices V and a set of labelled edges E. The vertices can be divided in 3 disjoint sets: the URIs U, the blank nodes B and literals L. • Two vertices connected by an edge represent a statement. Each statement is stored into a <subject,predicate,object> triple, where subject  (U  B) , object  V and predicate  E. • We can define the whole RDF graph as a set of triples RG. RG  (U  B) x E x V • The rdf:type property is used to state that a certain resource is an instance of a class. We define the set of classes as Cs. Cs = {c |<i,rdf:type,c>  RG ^ i  (U  B) } • We call partial cluster of classes (PC) a set of classes that concur in the multiple instantiation of the same resource: PC(i) = {c|<i,rdf:type,c>  RG ^ i  (U  B) } • and each PC(i)  C
  • 13. DBGroup@UNIMO 16Laura Po “Exposing the underlying schema of LOD sources” 16 • The partial cluster of classes (PC) are sets of classes that concur in the multiple instantiation of the same resource: PC(i) = {c|<i,rdf:type,c>  RG ^ i  (U  B) } • By examining all the instances in a RG graph, we find different PC. • The collection of all the PC that occur in a RG graph is called family of PC, C : C = {PC(i): i  (U  B)} • C contains a particular family of sets able to generate all the other sets. We call this family, family of super sets (S2), and we define it as follow: S = {ST  C: PC  C ^ PC  ST} • For each set st  S , a class ca  st must be elected to represent the entire set of classes. This class is called candidate agent of the superset. For each superset, we choose as candidate agent the class with the highest number of instances.
  • 14. DBGroup@UNIMO 17Laura Po “Exposing the underlying schema of LOD sources” 17 The Schema Summary is a pseudograph composed by: • C - Classes (nodes) • P - Properties (edges) And additional elements and function: • A - Attributes associated to each class – Each attribute represent the existence of a Datatype property from the instances of the class • 𝒍 - labels • l – labeling function • count - count function The Schema Summary is inferred by the distribution of the instances of a dataset
  • 15. DBGroup@UNIMO 18Laura Po “Exposing the underlying schema of LOD sources” 18 These indexes belong to extensional group of the Statistical Indexes [2]: • SC (Subject Class) contains the pairs (p,c) where p is an object property and c is its domain class. • SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class. • OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class. ex:Sector foaf:Organization sector1 organization1ex:sector dc:title “Energy” organization2 Extensional Classes Extensional Knowledge “Village electrification in the Pacific” “+41331231” ex:sector rdf:type rdf:type dbpedia:fax person1 foaf:Person ex:activity “Paolo” “Rossi” rdf:type ex:ceo rdf:type foaf:firstName foaf:lastName
  • 16. DBGroup@UNIMO 19Laura Po “Exposing the underlying schema of LOD sources” 19 These indexes belong to extensional group of the Statistical Indexes [2]: • SC (Subject Class) contains the pairs (p,c) where p is an object property and c is its domain class. • SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class. • OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class. ex:Sector foaf:Organization sector1 organization1ex:sector dc:title “Energy” organization2 Extensional Classes Extensional Knowledge “Village electrification in the Pacific” “+41331231” ex:sector rdf:type rdf:type dbpedia:fax person1 foaf:Person ex:activity “Paolo” “Rossi” rdf:type ex:ceo rdf:type foaf:firstName foaf:lastName
  • 17. DBGroup@UNIMO 20Laura Po “Exposing the underlying schema of LOD sources” 20 These indexes belong to extensional group of the Statistical Indexes [2]: • SC (Subject Class) contains the pairs (p,c) where p is an object property and c is its domain class. • SCl (Subject Class to literal) contains the pairs (p,c) where p is a datatype property and c is its domain class. • OC (Object Class) contains the pairs (p,c) where p is an object property and c is its range class. ex:Sector foaf:Organization sector1 organization1ex:sector dc:title “Energy” organization2 Extensional Classes Extensional Knowledge “Village electrification in the Pacific” “+41331231” ex:sector rdf:type rdf:type dbpedia:fax person1 foaf:Person ex:activity “Paolo” “Rossi” rdf:type ex:ceo rdf:type foaf:firstName foaf:lastName
  • 18. DBGroup@UNIMO 21Laura Po “Exposing the underlying schema of LOD sources” 21 We use an algorithm for combining these indexes and produce a Schema Summary Name Values SC (foaf:Organization,ex:ceo,1), (foaf:Organization,ex:sector,2) SCl (foaf:Person,foaf:firstName,1), (foaf:Person,foaf:lastName,1), (foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1), (foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1) OC (ex:Sector,ex:sector,1) (ex:Person,ex:ceo,1)
  • 19. DBGroup@UNIMO 22Laura Po “Exposing the underlying schema of LOD sources” 22 foaf:Organizzation 2 ex:Sector 1 ex:sector 2foaf:Person 1 ex:ceo 1 dc:title 1foaf:firstName 1 foaf:lastName 1 ex:activity 1 dbpedia:fax 1 We use an algorithm for combining these indexes and produce a Schema Summary Name Values SC (foaf:Organization,ex:ceo,1), (foaf:Organization,ex:sector,2) SCl (foaf:Person,foaf:firstName,1), (foaf:Person,foaf:lastName,1), (foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1), (foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1) OC (ex:Sector,ex:sector,1) (ex:Person,ex:ceo,1)
  • 20. DBGroup@UNIMO 23Laura Po “Exposing the underlying schema of LOD sources” 23 Two main modules • Extraction & Summarization – Index Extraction (IE) – Post Processing (PP) LOD Cloud SPARQL Queries LODeX Post- processing Statistical Indexes LODeX Indexes Extraction Endpoint URLs Schema Summary NoSQL SPARQL Queries Schema Summary Query Orchestrator Schema Summary Visualizzation Basic QueryResults • Visualization & Querying – Schema Summary Visualization – Query Orchestrator
  • 21. DBGroup@UNIMO 24Laura Po “Exposing the underlying schema of LOD sources” 24 Schema Summary Visualization Front end of the Web Application composed by three panel: • List of datasets indexed in LODeX • Schema Summary and query building panel • Refinement panel Query Orchestrator • It manages the interaction between the User and the GUI • It contains a SPARQL compiler able to compile the visual query in a SPARQL one
  • 22. DBGroup@UNIMO 25Laura Po “Exposing the underlying schema of LOD sources” 25
  • 23. DBGroup@UNIMO 26Laura Po “Exposing the underlying schema of LOD sources” 26