SlideShare a Scribd company logo
Assessing Linked Data Mappings using
                   Network Measures

   Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann


                9th Extended Semantic Web Conference (ESWC)
                                May 29, 2012




   http://latc-project.eu
ESWC - May 2012                     http://aksw.org
                            Assessing Linked Data mappings   http://www.vu.nl   1/25
The next 25+5 minutes
     The impact of links in the Web of Data


     Main questions
         What is the impact of link creation?
         Can we detect “bad” links based on their impact?
         Is adding links always a good thing?


     Contributions
         A framework to assess the impact of links
         Results for 5 metrics
ESWC - May 2012          Assessing Linked Data mappings     2/25
Is this a good or a bad link ?




ESWC - May 2012      Assessing Linked Data mappings   3/25
Measuring the Web of Data
     Look at the topology using network analysis tools


     Impossible to get the complete graph
         Sampling of the graph focusing on specific nodes
         See the bigger picture through aggregation


     Build the local network around a resource


     Repeat the process a sufficient number of time

ESWC - May 2012         Assessing Linked Data mappings      4/25
Network sampling process
     Use SPARQL end point or de-reference the
     resources to get the descriptions




ESWC - May 2012    Assessing Linked Data mappings   5/25
Aggregation of local results


                                                   Observed
                                                   Target




                    …




ESWC - May 2012   Assessing Linked Data mappings         6/25
Metrics
     Compute local scores for a resource


     Criteria
         Use only the local network
         Representative of a global property
         Not sensitive to change of observation scale


     5 metrics currently available in LinkQA


ESWC - May 2012         Assessing Linked Data mappings   7/25
What do we want to see?
     Increase of connectivity within topical groups
         Increase chances of finding related information


     More bridges between topical groups
         Improve browsing capabilities


     More connectivity around hubs
         Decrease the dependency upon the hubs



ESWC - May 2012         Assessing Linked Data mappings     8/25
Metric 1 – Degree
                                      Metric
                                           Number of edges
                                           around the target node


                                      Target
                                           Power-law distribution
                                           of values


                                      Intuition
                                           Presence of hubs

ESWC - May 2012   Assessing Linked Data mappings                    9/25
Metric 2 – Clustering coefficient
                                      Metric
                                           Density of links around
                                           the target node


                                      Target
                                           Increase clustering
                                           around nodes


                                      Intuition
                                           Topical clusters

ESWC - May 2012   Assessing Linked Data mappings                 10/25
Metric 3 – Centrality
                                      Metric
                                           Ratio between outgoing
                                           and incoming links


                                      Target
                                           Lower the discrepancy
                                           between the values


                                      Intuition
                                           Hubs are sensitive

ESWC - May 2012   Assessing Linked Data mappings                11/25
Metric 4 – SameAs chains
                                      Metric
                                           Number of “open”
                                           sameAs chains


                                      Target
                                           No open sameAs


                                      Intuition
                                           Peer agreement


ESWC - May 2012   Assessing Linked Data mappings              12/25
Metric 5 – Description enrichment
                                      Metric
                                           Richness of resource
                                           description


                                      Target
                                            Increase as possible


                                      Intuition
                                           “SameAsed” resources
                                           are complementary

ESWC - May 2012   Assessing Linked Data mappings                   13/25
Under the hood of LinkQA




ESWC - May 2012         Assessing Linked Data mappings                           14/25
                                    http://www.flickr.com/photos/cradlehall/5747161514
Workflow of an analysis




ESWC - May 2012   Assessing Linked Data mappings   15/25
Output of an analysis
     Results on the node and aggregated scale


     Per metric:
         Indication of change with respect to the target
         Sorted list of outlier nodes, sorted by their distance to
         the target


     Plus, a global ranking of nodes


     => Input for manual inspection by an expert
ESWC - May 2012          Assessing Linked Data mappings              16/25
Experimental results




ESWC - May 2012       Assessing Linked Data mappings   17/25
Global impact of links
     Observe the distributions to detect bad links




ESWC - May 2012      Assessing Linked Data mappings   18/25
First evaluation
     160 linking specifications for Silk, developed in
     the context of LATC


     6 linking specifications with manual verification of
     results
         50 positive links
         50 negative links


     Execute LinkQA with 10 samples of 50 links

ESWC - May 2012          Assessing Linked Data mappings   19/25
Results of the detection




     “C” if change detected in > 50% of runs

ESWC - May 2012     Assessing Linked Data mappings   20/25
Some explanations
     Low sensitivity of metrics:
         Lack of data
         Stable change


     50/50 accuracy of detection:
         Targets may not be the right ones
         Sample may not be big enough
         Semantics agnostic measures are less performant



ESWC - May 2012          Assessing Linked Data mappings    21/25
A closer look at the outliers
     See if the outliers are necessarily bad links




ESWC - May 2012      Assessing Linked Data mappings   22/25
Second evaluation
     Linking specifications for Silk, developed in the
     context of LATC


     All linking specifications sampled to have
         45 positive links
         5 negative links


     Execute LinkQA five time, on five samples



ESWC - May 2012          Assessing Linked Data mappings   23/25
Rank of positive and negative links




ESWC - May 2012   Assessing Linked Data mappings   24/25
Take home message
     LinkQA is a node centric approach to measure the
     impact of links in the WoD network
         Scalable, can be distributed


     Current results show that
         The 5 metrics defines are to be improved
         Metrics considering Semantics perform better
         The network sample seems too small
         Outliers detection improves with the number of metrics


ESWC - May 2012         Assessing Linked Data mappings       25/25

More Related Content

PDF
Strategic Management of Intellectual Property: R&D Investment Appraisal Using...
PDF
TripleCheckMate
ODP
Decentralised entity registry “WikiReg”
PPTX
Quality Metrics for Linked Open Data
PPTX
Linked Data Tutorial
PPTX
Applied semantic technology and linked data
PDF
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
PDF
Sieve - Data Quality and Fusion - LWDM2012
Strategic Management of Intellectual Property: R&D Investment Appraisal Using...
TripleCheckMate
Decentralised entity registry “WikiReg”
Quality Metrics for Linked Open Data
Linked Data Tutorial
Applied semantic technology and linked data
RDFUnit - Test-Driven Linked Data quality Assessment (WWW2014)
Sieve - Data Quality and Fusion - LWDM2012

Similar to Assessing Linked Data Mappings using Network Measures (20)

PDF
BIBFRAME Transisition Update
PPT
Linked Data Workshop Stanford University
PDF
Alabfi em-20120624
PDF
Soeren okfn greece meetup
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
PPT
Introduction to RAGLD
PDF
Gephi icwsm-tutorial
PDF
SP1: Exploratory Network Analysis with Gephi
PDF
NISO DCMI Webinar bibframe-20130123
PDF
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
PDF
Linked data and the future of scientific publishing
KEY
Introducing Riak
KEY
Introducing Riak
PDF
Linked Data Approach for Integration of Human Health & Environmental Data
PDF
NISO/DCMI Webinar: Translating the Library Catalog from MARC into Linked Data
PDF
Linked Data Snowball, or Why We Need Reconciliation
PPTX
Research into Practice case study 2: Library linked data implementations an...
PDF
Sharing data on the web (2013)
PPT
Metaphors as design points for collaboration 2012
PDF
Graph visualization options and latest developments
BIBFRAME Transisition Update
Linked Data Workshop Stanford University
Alabfi em-20120624
Soeren okfn greece meetup
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Introduction to RAGLD
Gephi icwsm-tutorial
SP1: Exploratory Network Analysis with Gephi
NISO DCMI Webinar bibframe-20130123
LASTconf 2018 - System Mapping: Discover, Communicate and Explore the Real Co...
Linked data and the future of scientific publishing
Introducing Riak
Introducing Riak
Linked Data Approach for Integration of Human Health & Environmental Data
NISO/DCMI Webinar: Translating the Library Catalog from MARC into Linked Data
Linked Data Snowball, or Why We Need Reconciliation
Research into Practice case study 2: Library linked data implementations an...
Sharing data on the web (2013)
Metaphors as design points for collaboration 2012
Graph visualization options and latest developments
Ad

More from Christophe Guéret (20)

PDF
HHAI June 2022 - KGs and Hybrid Intelligence
PDF
Informal presentation about RES
ODP
Stop making tools! Nobody likes them anyway...
ODP
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
ODP
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
PDF
The Entity Registry System (ERS)
PDF
Let's downscale the semantic web !
PDF
Your next data viz gear should be a Wii-U
PDF
Linking knowledge spaces
ODP
The data behind the HuisKluis
PDF
Digital archiving 3.0
PDF
The road towards a Web-based data ecosystem
PDF
Linked Open Data for Digital Humanities
PDF
Downscaling information systems for education
PDF
ICT4D course 2013 - Low resources infrastructure
PDF
ICT4D course 2013 - OLPC deployments
PDF
ICT4D course 2013 - Sugar
PDF
Exposing the data from NARCIS with VIVO
PDF
Clarifier le sens de vos données publiques avec le Web de données
ODP
Embedding young learners into the information society
HHAI June 2022 - KGs and Hybrid Intelligence
Informal presentation about RES
Stop making tools! Nobody likes them anyway...
The Entity Registry System: Collaborative Editing of Entity Data in Poorly Co...
Introduction about WorldWideSemanticWeb.org for the workshop "Making it Matter"
The Entity Registry System (ERS)
Let's downscale the semantic web !
Your next data viz gear should be a Wii-U
Linking knowledge spaces
The data behind the HuisKluis
Digital archiving 3.0
The road towards a Web-based data ecosystem
Linked Open Data for Digital Humanities
Downscaling information systems for education
ICT4D course 2013 - Low resources infrastructure
ICT4D course 2013 - OLPC deployments
ICT4D course 2013 - Sugar
Exposing the data from NARCIS with VIVO
Clarifier le sens de vos données publiques avec le Web de données
Embedding young learners into the information society
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation_ Review paper, used for researhc scholars
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Unlocking AI with Model Context Protocol (MCP)
Understanding_Digital_Forensics_Presentation.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm

Assessing Linked Data Mappings using Network Measures

  • 1. Assessing Linked Data Mappings using Network Measures Christophe Guéret, Paul Groth, Claus Stadler, Jens Lehmann 9th Extended Semantic Web Conference (ESWC) May 29, 2012 http://latc-project.eu ESWC - May 2012 http://aksw.org Assessing Linked Data mappings http://www.vu.nl 1/25
  • 2. The next 25+5 minutes The impact of links in the Web of Data Main questions What is the impact of link creation? Can we detect “bad” links based on their impact? Is adding links always a good thing? Contributions A framework to assess the impact of links Results for 5 metrics ESWC - May 2012 Assessing Linked Data mappings 2/25
  • 3. Is this a good or a bad link ? ESWC - May 2012 Assessing Linked Data mappings 3/25
  • 4. Measuring the Web of Data Look at the topology using network analysis tools Impossible to get the complete graph Sampling of the graph focusing on specific nodes See the bigger picture through aggregation Build the local network around a resource Repeat the process a sufficient number of time ESWC - May 2012 Assessing Linked Data mappings 4/25
  • 5. Network sampling process Use SPARQL end point or de-reference the resources to get the descriptions ESWC - May 2012 Assessing Linked Data mappings 5/25
  • 6. Aggregation of local results Observed Target … ESWC - May 2012 Assessing Linked Data mappings 6/25
  • 7. Metrics Compute local scores for a resource Criteria Use only the local network Representative of a global property Not sensitive to change of observation scale 5 metrics currently available in LinkQA ESWC - May 2012 Assessing Linked Data mappings 7/25
  • 8. What do we want to see? Increase of connectivity within topical groups Increase chances of finding related information More bridges between topical groups Improve browsing capabilities More connectivity around hubs Decrease the dependency upon the hubs ESWC - May 2012 Assessing Linked Data mappings 8/25
  • 9. Metric 1 – Degree Metric Number of edges around the target node Target Power-law distribution of values Intuition Presence of hubs ESWC - May 2012 Assessing Linked Data mappings 9/25
  • 10. Metric 2 – Clustering coefficient Metric Density of links around the target node Target Increase clustering around nodes Intuition Topical clusters ESWC - May 2012 Assessing Linked Data mappings 10/25
  • 11. Metric 3 – Centrality Metric Ratio between outgoing and incoming links Target Lower the discrepancy between the values Intuition Hubs are sensitive ESWC - May 2012 Assessing Linked Data mappings 11/25
  • 12. Metric 4 – SameAs chains Metric Number of “open” sameAs chains Target No open sameAs Intuition Peer agreement ESWC - May 2012 Assessing Linked Data mappings 12/25
  • 13. Metric 5 – Description enrichment Metric Richness of resource description Target Increase as possible Intuition “SameAsed” resources are complementary ESWC - May 2012 Assessing Linked Data mappings 13/25
  • 14. Under the hood of LinkQA ESWC - May 2012 Assessing Linked Data mappings 14/25 http://www.flickr.com/photos/cradlehall/5747161514
  • 15. Workflow of an analysis ESWC - May 2012 Assessing Linked Data mappings 15/25
  • 16. Output of an analysis Results on the node and aggregated scale Per metric: Indication of change with respect to the target Sorted list of outlier nodes, sorted by their distance to the target Plus, a global ranking of nodes => Input for manual inspection by an expert ESWC - May 2012 Assessing Linked Data mappings 16/25
  • 17. Experimental results ESWC - May 2012 Assessing Linked Data mappings 17/25
  • 18. Global impact of links Observe the distributions to detect bad links ESWC - May 2012 Assessing Linked Data mappings 18/25
  • 19. First evaluation 160 linking specifications for Silk, developed in the context of LATC 6 linking specifications with manual verification of results 50 positive links 50 negative links Execute LinkQA with 10 samples of 50 links ESWC - May 2012 Assessing Linked Data mappings 19/25
  • 20. Results of the detection “C” if change detected in > 50% of runs ESWC - May 2012 Assessing Linked Data mappings 20/25
  • 21. Some explanations Low sensitivity of metrics: Lack of data Stable change 50/50 accuracy of detection: Targets may not be the right ones Sample may not be big enough Semantics agnostic measures are less performant ESWC - May 2012 Assessing Linked Data mappings 21/25
  • 22. A closer look at the outliers See if the outliers are necessarily bad links ESWC - May 2012 Assessing Linked Data mappings 22/25
  • 23. Second evaluation Linking specifications for Silk, developed in the context of LATC All linking specifications sampled to have 45 positive links 5 negative links Execute LinkQA five time, on five samples ESWC - May 2012 Assessing Linked Data mappings 23/25
  • 24. Rank of positive and negative links ESWC - May 2012 Assessing Linked Data mappings 24/25
  • 25. Take home message LinkQA is a node centric approach to measure the impact of links in the WoD network Scalable, can be distributed Current results show that The 5 metrics defines are to be improved Metrics considering Semantics perform better The network sample seems too small Outliers detection improves with the number of metrics ESWC - May 2012 Assessing Linked Data mappings 25/25