SlideShare a Scribd company logo
REPAIRING HIDDEN LINKS
IN
LINKED DATA
ENHANCING THE QUALITY OF RDF KNOWLEDGE GRAPHS
Nandana Mihindukulasooriya, Mariano Rico, Idafen Santana-Pérez,
Raúl García-Castro and Asunción Gómez-Pérez
December 5th, 2017
The Ninth International Conference on Knowledge Capture
K-CAP 2018, Austin, Texas, United States
Things, not strings!
2
Things as strings!
3
dbr:Tim_Cook dbr:Apple_Pie
“Apple”
dbo:employer dbo:ingredient
• Introduce model inconsistencies
• Increases ambiguity
• Reduces connectivity
… after correctly linked
4
dbr:Tim_Cook dbr:Apple_Pie
dbr:Apple_Inc dbr:Apple
dbo:Company
dbr:Consumer
_electronics
dbr:IPhone
yago:Fruit
2.4
dbr:Rosacese
rdf:type rdf:typedbo:industry
dbo:family
dbp:fiber
dbo:product
dbo:employer dbo:ingredient
Is this a common problem?
• Web Data Commons (JSON-LD embedded data)
• http://webdatacommons.org/structureddata/
5
Property # of objects # of string
literals
Literal % IRI %
schema:creator 16,573,426 15,1782,874 95.23 4.77
schema:brand 4,694,411 2,420,908 51.57 48.43
schema:author 26,193,682 1,500,898 5.73 94.37
schema:
hiringOrganization
1,252,870 1,231,840 98.84 1.16
schema:publisher 31,317,151 560,577 1.79 98.21
Research Questions
• How to identify string literals that denote
entities in a KG?
• How to transform those string literals
into IRIs that correspond to entities they
denote?
• How to measure the improvement in
quality because of the transformation?
6
Entity relations
• Entity relations link two entities.
• All objects of entity relations should be entities.
• Not all relations are entity relations.
7
an entity relation
Person Company
Company
“Company Description”
a non-entity
relation
String to Entity IRI Transformation
Such string literals can be transformed to their
corresponding entity IRIs with high precision using both
ontological axioms and data profiling information.
8
“Apple”
dbr:Apple
dbr:Apple_Inc
dbr:Apple
_Bank
dbr:The_Apple_Film
Connectivity
Connectivity of a knowledge graph can be improved by
transforming literal nodes in entity relations into their
corresponding entity IRIs.
9
Related Problems - I
• Named entity disambiguation from text
10
Kill Bill was directed by Quentin Tarantino and stars Uma Thurman.
dbr:Kill_Bill:_Volume_1 dbr:Quentin_Tarantino dbr:Uma_Thurman
Related Problems - II
• Web Table Matching
11
company ind. loc.
IBM technology United
States
United airlines USA
… … …
dbr:United_States
dbr:Technology
dbr:IBM
dbr:United_Airlines
dbr:Airline
dbo:country
Approach
12
• Using ontological axioms
• Using data profiling information
Identification
of
entity relations
• Context generation
• Type identification (entity relation
range)
• Entity IRI identification
String literal to
IRI conversion
Identification of entity relations
13
Identification of
entity relations
RDF Graph Ontology Definitions
Entity
relations
Other
relations
P1 P2
P3 Pn
PX PY
Identification of entity relations
14
Identification of
entity relations
OWL Object Properties
Property range
Entity Relation Classifier
Features
IRI %, Lit%, DistinctIRI%,
DistinctLit%, String%, Num%,
Date%
Training Data
Known entity relations
Manually annotated rdfs props
Ontology-
driven
Data-
driven
see the paper for
the algorithm
Approach
15
• Using ontological axioms
• Using data profiling information
Identification
of
entity relations
•Type identification (entity relation
range)
•Context generation
•Entity disambiguation
String literal to
IRI conversion
Type identification and entity linking
16
employer
dbr:Google
dbr:Microsoft
dbr:Yahoo!
dbr: Amazon_(company)
“Apple”
“I.B.M”
“Johnson & Johnson”
“Mars”
“Oracle”
“Sun”
range information (ontology)
Type
Prediction
(range)
Type
restrictions
Type frequency
analysis
Entity linking
Entity
disambiguation
Context generation
17
Tim Cook employer Apple.
Timothy Donald "Tim" Cook is an American
business executive, industrial engineer and
developer. Cook is the current and seventh
Chief Executive Officer of Apple Inc.,
previously serving as the company's Chief
Operating Officer, under its ....
Apple Pie ingredient Apple.
An apple pie is a fruit pie, in which the
principal filling ingredient is apple. It is, on
occasion, served with whipped cream or ice
cream on top, or alongside cheddar cheese.
The pastry is generally used top-and-bottom
...
Evaluation
18
Evaluation - I
• Can the approach correctly identity entity relations?
• manually-annotated gold standard
• 3 annotators with high inter-annotator agreement
19
Class Entity
Relations
Detected relations Prec. Recall
Correct Incorrect
dbo:Athlete 183 178 53 77.06% 97.27%
dbo:SportsTeam 157 156 48 76.47% 99.36%
dbo:SportsEvent 116 115 41 73.71% 99.14%
Total 456 449 142 75.97% 98.47%
High Recall
Evaluation - II
• Can the approach correctly disambiguate and link the
identified strings?
• Manual verification of a random sample of 300
20
Class Sample
Size
Disambiguated Prec. Recall
Correct Incorrect
dbo:Athlete 100 51 50 98.04% 50%
dbo:SportsTeam 100 44 44 100% 44%
dbo:SportsEvent 100 58 55 94.83% 56%
Total 300 153 149 97.38% 49.67%
High Precision
Graph metrics
21
# of
edges
# of
components
Isolated
components
largest component
size
8 6 3 5
• connected components
Graph metrics
22
# of
edges
# of
components
Isolated
components
largest component
size
8 6 3 5
12 (+4) 2 (-4) 0 (-3) 12 (+7)
• connected components
Evaluation - III
• Does this transformation increase the connectivity of
the graph?
23
Graph edges components isolated Largest component
size % of
nodes
Original 828,310 119,623 112,331 168,128 54.28
Repaired 1,035,912 99,137 93,507 192,805 64.16
+207,602 -20,486 -18,824 +24,677 +9.88
+25.06% -17.12% -16.76% +14.68%
Improved Connectivity
Conclusions and future work
• A large number of entities are represented as strings
in Linked Data .
• The proposed approach can detect such strings with
high recall (98%) and covert them to their
corresponding entity IRIs with high precision (97%).
• We could add 25% more links and improved the
connectivity of a subgraph of DBpedia by 17%
• In future, improve the algorithm by using more
context information from the graph for the task.
24
25

More Related Content

PDF
On Entities and Evaluation
PDF
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
PPTX
Introduction to SQL Server Graph DB
PDF
20141216 graph database prototyping ams meetup
PPTX
dbms ppt parul university dbms course for
PDF
Ontologies & linked open data
PPTX
Knowledge Graph Introduction
PDF
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data
On Entities and Evaluation
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Introduction to SQL Server Graph DB
20141216 graph database prototyping ams meetup
dbms ppt parul university dbms course for
Ontologies & linked open data
Knowledge Graph Introduction
Build Knowledge Graphs with Oracle RDF to Extract More Value from Your Data

Similar to Repairing Hidden Links in Linked Data (20)

PPTX
Improving Graph Based Entity Resolution with Data Mining and NLP
PDF
Efficient Query Answering against Dynamic RDF Databases
PPTX
Linked Data Modeling for Beginner
PDF
Link Discovery Tutorial Introduction
PDF
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
PPTX
Semantic Web and organizational data .pptx
PDF
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
PPTX
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
PPTX
Cogapp Open Studios 2012 - Adventures with Linked Data
ODP
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
ODP
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
PPTX
Logical Detection of Invalid SameAs Statements in RDF Data
PDF
Improving Entity Retrieval on Structured Data
PPTX
Linked data for Enterprise Data Integration
PDF
RDF: what and why plus a SPARQL tutorial
PDF
Archives & the Semantic Web
PPTX
Beyond Kaggle: Solving Data Science Challenges at Scale
PDF
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
PDF
Jarrar: Data Integration and Fusion using RDF
PPTX
Predicting query performance and explaining results to assist Linked Data con...
Improving Graph Based Entity Resolution with Data Mining and NLP
Efficient Query Answering against Dynamic RDF Databases
Linked Data Modeling for Beginner
Link Discovery Tutorial Introduction
A LINK-BASED APPROACH TO ENTITY RESOLUTION IN SOCIAL NETWORKS
Semantic Web and organizational data .pptx
Alec Mitchell Relationship Building Defining And Querying Complex Relatio...
‘Facilitating User Engagement by Enriching Library Data using Semantic Techno...
Cogapp Open Studios 2012 - Adventures with Linked Data
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
Logical Detection of Invalid SameAs Statements in RDF Data
Improving Entity Retrieval on Structured Data
Linked data for Enterprise Data Integration
RDF: what and why plus a SPARQL tutorial
Archives & the Semantic Web
Beyond Kaggle: Solving Data Science Challenges at Scale
Searching and Querying Knowledge Graphs with Solr/SIREn - A Reference Archite...
Jarrar: Data Integration and Fusion using RDF
Predicting query performance and explaining results to assist Linked Data con...
Ad

More from Nandana Mihindukulasooriya (20)

PPTX
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
PPTX
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
PPTX
ISWC 2020 - Semantic Answer Type Prediction
PDF
Fitur - HackaTrips 2018!
PDF
A Distributed Transaction Model for Read-Write Linked Data Applications
PPTX
Loupe API - A Linked Data Profiling Service for Quality Assessment
PDF
Research Poster Design
PPTX
Collaborative Ontology Evolution and Data Quality - An Empirical Analysis
PPTX
Erasmus+ promotional event - Kandy, Sri Lanka
PPTX
Loupe model - Use Cases and Requirements
PPTX
4V - WP3 Progress Report (TIN2013-46238)
PPTX
Introduction to W3C Linked Data Platform
PPTX
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
PPTX
An analysis of the quality issues of the properties available in the Spanish ...
PPTX
Describing LDP Applications with the Hydra Core Vocabulary
PPTX
Learning W3C Linked Data Platform with examples
PPTX
Linked data platform adapter for bugzilla poster
PPTX
LDP4j: A framework for the development of interoperable read-write Linked Da...
PDF
morph-LDP: An R2RML-based Linked Data Platform implementation
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
Leveraging Semantic Parsing for Relation Linking over Knowledge Bases
ISWC 2020 - Semantic Answer Type Prediction
Fitur - HackaTrips 2018!
A Distributed Transaction Model for Read-Write Linked Data Applications
Loupe API - A Linked Data Profiling Service for Quality Assessment
Research Poster Design
Collaborative Ontology Evolution and Data Quality - An Empirical Analysis
Erasmus+ promotional event - Kandy, Sri Lanka
Loupe model - Use Cases and Requirements
4V - WP3 Progress Report (TIN2013-46238)
Introduction to W3C Linked Data Platform
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixt...
An analysis of the quality issues of the properties available in the Spanish ...
Describing LDP Applications with the Hydra Core Vocabulary
Learning W3C Linked Data Platform with examples
Linked data platform adapter for bugzilla poster
LDP4j: A framework for the development of interoperable read-write Linked Da...
morph-LDP: An R2RML-based Linked Data Platform implementation
Ad

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Per capita expenditure prediction using model stacking based on satellite ima...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
cuic standard and advanced reporting.pdf
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25 Week I

Repairing Hidden Links in Linked Data

  • 1. REPAIRING HIDDEN LINKS IN LINKED DATA ENHANCING THE QUALITY OF RDF KNOWLEDGE GRAPHS Nandana Mihindukulasooriya, Mariano Rico, Idafen Santana-Pérez, Raúl García-Castro and Asunción Gómez-Pérez December 5th, 2017 The Ninth International Conference on Knowledge Capture K-CAP 2018, Austin, Texas, United States
  • 3. Things as strings! 3 dbr:Tim_Cook dbr:Apple_Pie “Apple” dbo:employer dbo:ingredient • Introduce model inconsistencies • Increases ambiguity • Reduces connectivity
  • 4. … after correctly linked 4 dbr:Tim_Cook dbr:Apple_Pie dbr:Apple_Inc dbr:Apple dbo:Company dbr:Consumer _electronics dbr:IPhone yago:Fruit 2.4 dbr:Rosacese rdf:type rdf:typedbo:industry dbo:family dbp:fiber dbo:product dbo:employer dbo:ingredient
  • 5. Is this a common problem? • Web Data Commons (JSON-LD embedded data) • http://webdatacommons.org/structureddata/ 5 Property # of objects # of string literals Literal % IRI % schema:creator 16,573,426 15,1782,874 95.23 4.77 schema:brand 4,694,411 2,420,908 51.57 48.43 schema:author 26,193,682 1,500,898 5.73 94.37 schema: hiringOrganization 1,252,870 1,231,840 98.84 1.16 schema:publisher 31,317,151 560,577 1.79 98.21
  • 6. Research Questions • How to identify string literals that denote entities in a KG? • How to transform those string literals into IRIs that correspond to entities they denote? • How to measure the improvement in quality because of the transformation? 6
  • 7. Entity relations • Entity relations link two entities. • All objects of entity relations should be entities. • Not all relations are entity relations. 7 an entity relation Person Company Company “Company Description” a non-entity relation
  • 8. String to Entity IRI Transformation Such string literals can be transformed to their corresponding entity IRIs with high precision using both ontological axioms and data profiling information. 8 “Apple” dbr:Apple dbr:Apple_Inc dbr:Apple _Bank dbr:The_Apple_Film
  • 9. Connectivity Connectivity of a knowledge graph can be improved by transforming literal nodes in entity relations into their corresponding entity IRIs. 9
  • 10. Related Problems - I • Named entity disambiguation from text 10 Kill Bill was directed by Quentin Tarantino and stars Uma Thurman. dbr:Kill_Bill:_Volume_1 dbr:Quentin_Tarantino dbr:Uma_Thurman
  • 11. Related Problems - II • Web Table Matching 11 company ind. loc. IBM technology United States United airlines USA … … … dbr:United_States dbr:Technology dbr:IBM dbr:United_Airlines dbr:Airline dbo:country
  • 12. Approach 12 • Using ontological axioms • Using data profiling information Identification of entity relations • Context generation • Type identification (entity relation range) • Entity IRI identification String literal to IRI conversion
  • 13. Identification of entity relations 13 Identification of entity relations RDF Graph Ontology Definitions Entity relations Other relations P1 P2 P3 Pn PX PY
  • 14. Identification of entity relations 14 Identification of entity relations OWL Object Properties Property range Entity Relation Classifier Features IRI %, Lit%, DistinctIRI%, DistinctLit%, String%, Num%, Date% Training Data Known entity relations Manually annotated rdfs props Ontology- driven Data- driven see the paper for the algorithm
  • 15. Approach 15 • Using ontological axioms • Using data profiling information Identification of entity relations •Type identification (entity relation range) •Context generation •Entity disambiguation String literal to IRI conversion
  • 16. Type identification and entity linking 16 employer dbr:Google dbr:Microsoft dbr:Yahoo! dbr: Amazon_(company) “Apple” “I.B.M” “Johnson & Johnson” “Mars” “Oracle” “Sun” range information (ontology) Type Prediction (range) Type restrictions Type frequency analysis Entity linking Entity disambiguation
  • 17. Context generation 17 Tim Cook employer Apple. Timothy Donald "Tim" Cook is an American business executive, industrial engineer and developer. Cook is the current and seventh Chief Executive Officer of Apple Inc., previously serving as the company's Chief Operating Officer, under its .... Apple Pie ingredient Apple. An apple pie is a fruit pie, in which the principal filling ingredient is apple. It is, on occasion, served with whipped cream or ice cream on top, or alongside cheddar cheese. The pastry is generally used top-and-bottom ...
  • 19. Evaluation - I • Can the approach correctly identity entity relations? • manually-annotated gold standard • 3 annotators with high inter-annotator agreement 19 Class Entity Relations Detected relations Prec. Recall Correct Incorrect dbo:Athlete 183 178 53 77.06% 97.27% dbo:SportsTeam 157 156 48 76.47% 99.36% dbo:SportsEvent 116 115 41 73.71% 99.14% Total 456 449 142 75.97% 98.47% High Recall
  • 20. Evaluation - II • Can the approach correctly disambiguate and link the identified strings? • Manual verification of a random sample of 300 20 Class Sample Size Disambiguated Prec. Recall Correct Incorrect dbo:Athlete 100 51 50 98.04% 50% dbo:SportsTeam 100 44 44 100% 44% dbo:SportsEvent 100 58 55 94.83% 56% Total 300 153 149 97.38% 49.67% High Precision
  • 21. Graph metrics 21 # of edges # of components Isolated components largest component size 8 6 3 5 • connected components
  • 22. Graph metrics 22 # of edges # of components Isolated components largest component size 8 6 3 5 12 (+4) 2 (-4) 0 (-3) 12 (+7) • connected components
  • 23. Evaluation - III • Does this transformation increase the connectivity of the graph? 23 Graph edges components isolated Largest component size % of nodes Original 828,310 119,623 112,331 168,128 54.28 Repaired 1,035,912 99,137 93,507 192,805 64.16 +207,602 -20,486 -18,824 +24,677 +9.88 +25.06% -17.12% -16.76% +14.68% Improved Connectivity
  • 24. Conclusions and future work • A large number of entities are represented as strings in Linked Data . • The proposed approach can detect such strings with high recall (98%) and covert them to their corresponding entity IRIs with high precision (97%). • We could add 25% more links and improved the connectivity of a subgraph of DBpedia by 17% • In future, improve the algorithm by using more context information from the graph for the task. 24
  • 25. 25