SlideShare a Scribd company logo
The Web of Data as a Complex System - First insight into its multi-scale  network properties Christophe Guéret,  Shenghui Wang , and Stefan Schlobach   Department of Computer Science, Network Institute Vrije Universiteit Amsterdam
Outline What is the Web of Data?   How complex is the Web of Data?   A new way of seeing the Web of Data   What have we found?   What are the challenges?
What is the Web of Data? The Semantic Web is a web of data                                  --  http://www.w3.org/2001/sw/   Linked Data  is a sub-topic of the  Semantic Web . The term Linked Data is used to describe a method of exposing, sharing, and connecting  data  via  dereferenceable URIs  on the  Web .  --  http://en.wikipedia.org/wiki/Linked_Data     Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.  --  http://linkeddata.org/    
Four principles of Linked Data Use  URIs  to identify things. Use  HTTP  URIs so that these things can be referred to and looked up (" dereferenced ") by people and  user agents . Provide useful information about the thing when its URI is dereferenced, using standard formats such as  RDF/XML . Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. -- Tim Berners-Lee
http://dbpedia.org/resource/Amsterdam http://dbpedia.org/resource/City http://www.w3.org/1999/02/22-rdf-syntax-ns# type http://umbel.org/umbel/ne/wikipedia/Amsterdam http://www.w3.org/2002/07/owl#sameAs http://www.freebase.com/view/en/abraham_pais http://dbpedia.org/ontology/birthPlace An example of linked data Nodes are shared across statements The links have some meaning
Since 2006, people are creating linked data
October 2007
July 2009
Evolution of the Web of Data
The WoD is a complex system! More than 260 extremely heterogeneous datasets general-purposed datasets, such as DBpedia domain-oriented datasets, such as Bio2RDF government data, music data, geological data, social network data, etc.   Nearly 50 billion RDF triples Nearly 50 billion links within the datasets More than 800 million links between the datasets   Embedded rich semantics in the data data points are typed links are typed links is what makes the statements useful
Amsterdam The Netherlands isLocatedIn Christophe VU Amsterdam workIn isLocatedIn workIn workIn The links have explicit semantics, which brings implicit links deduced after the reasoning process
People are trying to use the WoD Billion triple challenges since 2008        "The specific goal of the Billion Triples Track is to demonstrate the scalability of applications as well as to encourage the development of applications that can deal with Web data. We stress that the goal of this is not to be a benchmarking effort between triple stores, but rather to  demonstrate applications that can scale to a Web scale using realistic Web-quality data . " http://challenge.semanticweb.org/
The WoD itself should be robust  Is there central hubs whose failure would lead to lack of connectivity?   The WoD is designed for automated agents that have less capability to recover from the failure of the connectivity.   The robustness of the WoD should be ensured    Up till now, the WoD could be studied, searched and maintained like a classical database
Network analysis A new way of seeing the WoD   What network analysis tells us
A new way of seeing the WoD Consider the WoD as network
Applying network analysis over the WoD Average path length Degree distribution Strongly connected components Degree centrality Between centrality Closeness centrality
Scales of observation of the WoD    1. Graphs scale
Graph-scale WoD network Each dataset is a node   Edges are weighted, directed connections between the datasets if there is at least one triple having a subject within dataset 1 and an object within dataset 2, then there is an edge between these two datasets.  the number of such triples is the weight of the edge.       
110 nodes with 350 edges Average path length is 2.16 50 components
The degree of 7 is critical point after which the network is not scale-free any more.
Top central nodes Betweenness centrality Closeness centrality Degree centrality Every centrality has a specific meaning... Node Value DBpedia 0.332 DBLP Berlin 0.108  DBLP (RKB) 0.100 DBLP Hannover 0.097 FOAF profiles 0.075 Node Value DBpedia 0.762 Geonames 0.614 Drug Bank 0.576 Linked MDB 0.544 Flickr wrappr 0.526 Node Value DBpedia 0.505 UniProt 0.266 DBLP (RKB) 0.266 ACM (RKB) 0.229 GeneID 0.211
Scales of observation of the WoD 2. Triple scale
Triple-scale WoD network We took the 10 million triples from the dataset crawled from the WoD, provided by the billion triple challenge 2009    This "BTC" network is defined as G=(V, (E, L)), where V is a set of nodes, and each node is a URI or a literal E is a set of edges L is a set of labels, each label characterising a relation between nodes   We applied a few strategies to aggregate data for comparison. 
Triple-scale network and its aggregations BTC aggregated: triples are aggregated by the domain names BTC aggregated + filter: only domain names shared with the graph-scale network  Network Nodes Eges Average path length Components BTC 605K 860K 2.15 602K BTC aggregated  14K 31K 2.80 7K BTC aggregated + filter 37 91 1.88 17
Degree distribution BTC BTC aggregated Power-law distribution
Top central nodes:
The next steps   Open challenges Ongoing research activities at VUA
Challenges: Existence of implicit links “ Semantic virus” Amsterdam The Netherlands isLocatedIn Christophe VU Amsterdam workIn isLocatedIn workIn workIn Asia isLocatedIn
Challenges: Multi-relations links FOAF (social networks + personal information) SIOC (relations characterising blogs) SWRC (describing research work) … Different filtering produce different networks Centrality status of nodes changes  w.r.t  the networks Dynamics Data will be continuously added and linked.
“ sameAs” networks
Monitoring and Improving the WoD Linked data is meant to be browsed, jumping from one ressource to another The presence of Hubs is critical for the paths Create alternate paths to be used in case of failure   Guéret, Groth, van Harmelen, Schlobach, " Finding the Achilles Heel of the Web of Data:   using network analysis for link-recommendation ", ISWC2010 - To appear
{cgueret, swang, schlobac}@few.vu.nl We need to study more!

More Related Content

PDF
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
PDF
Dataset Citation and Identification
PPT
Jan Brase: Data and Libraries - the DataCite consortium
PPTX
Arcomem training enrichment_advanced
PPTX
2015 07-tuto3-mining hin
DOCX
Assignment 01
PPTX
Example of linear programming
PPTX
Computer network
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
Dataset Citation and Identification
Jan Brase: Data and Libraries - the DataCite consortium
Arcomem training enrichment_advanced
2015 07-tuto3-mining hin
Assignment 01
Example of linear programming
Computer network

What's hot (20)

PPT
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
PDF
Arcomem training – Enrichment Advanced (update)
PPTX
Linked data HHS 2015
PPTX
Metadata standards
PDF
Knowledge Graph Futures
PDF
Think like a Digital Curator
PDF
Semantic Web Nature
PPTX
Data.dcs: Converting Legacy Data into Linked Data
PPTX
COMPUTER NETWORK
PPTX
Linked Data: Why Bother?
PPTX
Annotations chicago
PPTX
Terms related to Computer network
PPTX
03 interlinking-dass
PDF
คอมปอ
PDF
Website Performance at Client Level
KEY
Semantic Web and Linked Open Data
PPT
Going for GOLD - Adventures in Open Linked Geospatial Metadata
PDF
Turning Data into Knowledge (KESW2014 Keynote)
PDF
Web Content Mining Based on Dom Intersection and Visual Features Concept
PPTX
More ways of symbol grounding for knowledge graphs?
Edinburgh DataShare: Tackling research data in a DSpace institutional repository
Arcomem training – Enrichment Advanced (update)
Linked data HHS 2015
Metadata standards
Knowledge Graph Futures
Think like a Digital Curator
Semantic Web Nature
Data.dcs: Converting Legacy Data into Linked Data
COMPUTER NETWORK
Linked Data: Why Bother?
Annotations chicago
Terms related to Computer network
03 interlinking-dass
คอมปอ
Website Performance at Client Level
Semantic Web and Linked Open Data
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Turning Data into Knowledge (KESW2014 Keynote)
Web Content Mining Based on Dom Intersection and Visual Features Concept
More ways of symbol grounding for knowledge graphs?
Ad

Similar to ECCS 2010 (20)

PDF
Content Distribution for Peer-To-Peer Overlays on Mobile Adhoc Networks to Fu...
PPT
050317 Ws Telecon Husar
PDF
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
PPTX
Web 3.0 & io t (en)
PPTX
Web 3.0 & IoT (English)
PDF
The Proliferation And Advances Of Computer Networks
PDF
F233842
PDF
International Refereed Journal of Engineering and Science (IRJES)
PPSX
Linked Data to Improve the OER Experience
PDF
Linked Data Generation for the University Data From Legacy Database
PPTX
Keynote at AImWD
PDF
Flexible Bloom for Searching Textual Content Based Retrieval System in an Uns...
PDF
Flexible bloom for searching textual content
PDF
Flexible bloom for searching textual content
DOCX
What is network architecture (full)
PPT
CTS Conference Web 2.0 Tutorial Part 1
PDF
What is network architecture
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
PPTX
Data Communication & Computer Networks_Unit 1 -ASRao.pptx
PPT
The Web and SKOS, ISKO July 2008
Content Distribution for Peer-To-Peer Overlays on Mobile Adhoc Networks to Fu...
050317 Ws Telecon Husar
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Web 3.0 & io t (en)
Web 3.0 & IoT (English)
The Proliferation And Advances Of Computer Networks
F233842
International Refereed Journal of Engineering and Science (IRJES)
Linked Data to Improve the OER Experience
Linked Data Generation for the University Data From Legacy Database
Keynote at AImWD
Flexible Bloom for Searching Textual Content Based Retrieval System in an Uns...
Flexible bloom for searching textual content
Flexible bloom for searching textual content
What is network architecture (full)
CTS Conference Web 2.0 Tutorial Part 1
What is network architecture
Cyberinfrastructure and Applications Overview: Howard University June22
Data Communication & Computer Networks_Unit 1 -ASRao.pptx
The Web and SKOS, ISKO July 2008
Ad

More from Shenghui Wang (13)

PDF
Non-parametric Subject Prediction
PDF
Our journey with semantic embedding
PPTX
Linking entities via semantic indexing
PDF
Semantic indexing for KOS
PDF
Contextualization of topics - browsing through terms, authors, journals and c...
PPTX
Exploring a world of networked information built from free-text metadata
PPTX
Ariadne's Thread -- Exploring a world of networked information built from fre...
PDF
Learning Concept Mappings from Instance Similarity
PDF
Measuring the dynamic bi-directional influence between content and social ne...
PDF
Similarity Features, and their Role in Concept Alignment Learning
PDF
What is concept dirft and how to measure it?
PDF
ICA Slides
PDF
Study concept drift in political ontologies
Non-parametric Subject Prediction
Our journey with semantic embedding
Linking entities via semantic indexing
Semantic indexing for KOS
Contextualization of topics - browsing through terms, authors, journals and c...
Exploring a world of networked information built from free-text metadata
Ariadne's Thread -- Exploring a world of networked information built from fre...
Learning Concept Mappings from Instance Similarity
Measuring the dynamic bi-directional influence between content and social ne...
Similarity Features, and their Role in Concept Alignment Learning
What is concept dirft and how to measure it?
ICA Slides
Study concept drift in political ontologies

Recently uploaded (20)

PDF
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
PDF
Understanding the Rhetorical Situation Presentation in Blue Orange Muted Il_2...
PPTX
Overview Planner of Soft Skills in a single ppt
PDF
Prostaglandin E2.pdf orthoodontics op kharbanda
PPTX
Nervous_System_Drugs_PPT.pptxXXXXXXXXXXXXXXXXX
PPTX
PMP (Project Management Professional) course prepares individuals
PPTX
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
PDF
Biography of Mohammad Anamul Haque Nayan
PPTX
_+✅+JANUARY+2025+MONTHLY+CA.pptx current affairs
PPTX
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
PPTX
退学买新西兰毕业证(WelTec毕业证书)惠灵顿理工学院毕业证国外证书制作
PDF
Blue-Modern-Elegant-Presentation (1).pdf
PPTX
Job-opportunities lecture about it skills
DOCX
mcsp232projectguidelinesjan2023 (1).docx
PDF
Daisia Frank: Strategy-Driven Real Estate with Heart.pdf
PPTX
1751884730-Visual Basic -Unitj CS B.pptx
DOCX
How to Become a Criminal Profiler or Behavioural Analyst.docx
PPTX
ESD MODULE-5hdbdhbdbdbdbbdbdbbdndbdbdbdbbdbd
PPTX
Autonomic_Nervous_SystemM_Drugs_PPT.pptx
PPTX
Definition and Relation of Food Science( Lecture1).pptx
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
Understanding the Rhetorical Situation Presentation in Blue Orange Muted Il_2...
Overview Planner of Soft Skills in a single ppt
Prostaglandin E2.pdf orthoodontics op kharbanda
Nervous_System_Drugs_PPT.pptxXXXXXXXXXXXXXXXXX
PMP (Project Management Professional) course prepares individuals
chapter 3_bem.pptxKLJLKJLKJLKJKJKLJKJKJKHJH
Biography of Mohammad Anamul Haque Nayan
_+✅+JANUARY+2025+MONTHLY+CA.pptx current affairs
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
退学买新西兰毕业证(WelTec毕业证书)惠灵顿理工学院毕业证国外证书制作
Blue-Modern-Elegant-Presentation (1).pdf
Job-opportunities lecture about it skills
mcsp232projectguidelinesjan2023 (1).docx
Daisia Frank: Strategy-Driven Real Estate with Heart.pdf
1751884730-Visual Basic -Unitj CS B.pptx
How to Become a Criminal Profiler or Behavioural Analyst.docx
ESD MODULE-5hdbdhbdbdbdbbdbdbbdndbdbdbdbbdbd
Autonomic_Nervous_SystemM_Drugs_PPT.pptx
Definition and Relation of Food Science( Lecture1).pptx

ECCS 2010

  • 1. The Web of Data as a Complex System - First insight into its multi-scale network properties Christophe Guéret, Shenghui Wang , and Stefan Schlobach   Department of Computer Science, Network Institute Vrije Universiteit Amsterdam
  • 2. Outline What is the Web of Data?   How complex is the Web of Data?   A new way of seeing the Web of Data   What have we found?   What are the challenges?
  • 3. What is the Web of Data? The Semantic Web is a web of data                                 -- http://www.w3.org/2001/sw/   Linked Data is a sub-topic of the Semantic Web . The term Linked Data is used to describe a method of exposing, sharing, and connecting data via dereferenceable URIs on the Web . -- http://en.wikipedia.org/wiki/Linked_Data     Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. -- http://linkeddata.org/    
  • 4. Four principles of Linked Data Use URIs to identify things. Use HTTP URIs so that these things can be referred to and looked up (" dereferenced ") by people and user agents . Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML . Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. -- Tim Berners-Lee
  • 5. http://dbpedia.org/resource/Amsterdam http://dbpedia.org/resource/City http://www.w3.org/1999/02/22-rdf-syntax-ns# type http://umbel.org/umbel/ne/wikipedia/Amsterdam http://www.w3.org/2002/07/owl#sameAs http://www.freebase.com/view/en/abraham_pais http://dbpedia.org/ontology/birthPlace An example of linked data Nodes are shared across statements The links have some meaning
  • 6. Since 2006, people are creating linked data
  • 9. Evolution of the Web of Data
  • 10. The WoD is a complex system! More than 260 extremely heterogeneous datasets general-purposed datasets, such as DBpedia domain-oriented datasets, such as Bio2RDF government data, music data, geological data, social network data, etc.   Nearly 50 billion RDF triples Nearly 50 billion links within the datasets More than 800 million links between the datasets   Embedded rich semantics in the data data points are typed links are typed links is what makes the statements useful
  • 11. Amsterdam The Netherlands isLocatedIn Christophe VU Amsterdam workIn isLocatedIn workIn workIn The links have explicit semantics, which brings implicit links deduced after the reasoning process
  • 12. People are trying to use the WoD Billion triple challenges since 2008       "The specific goal of the Billion Triples Track is to demonstrate the scalability of applications as well as to encourage the development of applications that can deal with Web data. We stress that the goal of this is not to be a benchmarking effort between triple stores, but rather to demonstrate applications that can scale to a Web scale using realistic Web-quality data . " http://challenge.semanticweb.org/
  • 13. The WoD itself should be robust Is there central hubs whose failure would lead to lack of connectivity?   The WoD is designed for automated agents that have less capability to recover from the failure of the connectivity.   The robustness of the WoD should be ensured   Up till now, the WoD could be studied, searched and maintained like a classical database
  • 14. Network analysis A new way of seeing the WoD   What network analysis tells us
  • 15. A new way of seeing the WoD Consider the WoD as network
  • 16. Applying network analysis over the WoD Average path length Degree distribution Strongly connected components Degree centrality Between centrality Closeness centrality
  • 17. Scales of observation of the WoD   1. Graphs scale
  • 18. Graph-scale WoD network Each dataset is a node   Edges are weighted, directed connections between the datasets if there is at least one triple having a subject within dataset 1 and an object within dataset 2, then there is an edge between these two datasets.  the number of such triples is the weight of the edge.       
  • 19. 110 nodes with 350 edges Average path length is 2.16 50 components
  • 20. The degree of 7 is critical point after which the network is not scale-free any more.
  • 21. Top central nodes Betweenness centrality Closeness centrality Degree centrality Every centrality has a specific meaning... Node Value DBpedia 0.332 DBLP Berlin 0.108 DBLP (RKB) 0.100 DBLP Hannover 0.097 FOAF profiles 0.075 Node Value DBpedia 0.762 Geonames 0.614 Drug Bank 0.576 Linked MDB 0.544 Flickr wrappr 0.526 Node Value DBpedia 0.505 UniProt 0.266 DBLP (RKB) 0.266 ACM (RKB) 0.229 GeneID 0.211
  • 22. Scales of observation of the WoD 2. Triple scale
  • 23. Triple-scale WoD network We took the 10 million triples from the dataset crawled from the WoD, provided by the billion triple challenge 2009    This "BTC" network is defined as G=(V, (E, L)), where V is a set of nodes, and each node is a URI or a literal E is a set of edges L is a set of labels, each label characterising a relation between nodes   We applied a few strategies to aggregate data for comparison. 
  • 24. Triple-scale network and its aggregations BTC aggregated: triples are aggregated by the domain names BTC aggregated + filter: only domain names shared with the graph-scale network Network Nodes Eges Average path length Components BTC 605K 860K 2.15 602K BTC aggregated 14K 31K 2.80 7K BTC aggregated + filter 37 91 1.88 17
  • 25. Degree distribution BTC BTC aggregated Power-law distribution
  • 27. The next steps   Open challenges Ongoing research activities at VUA
  • 28. Challenges: Existence of implicit links “ Semantic virus” Amsterdam The Netherlands isLocatedIn Christophe VU Amsterdam workIn isLocatedIn workIn workIn Asia isLocatedIn
  • 29. Challenges: Multi-relations links FOAF (social networks + personal information) SIOC (relations characterising blogs) SWRC (describing research work) … Different filtering produce different networks Centrality status of nodes changes w.r.t the networks Dynamics Data will be continuously added and linked.
  • 31. Monitoring and Improving the WoD Linked data is meant to be browsed, jumping from one ressource to another The presence of Hubs is critical for the paths Create alternate paths to be used in case of failure   Guéret, Groth, van Harmelen, Schlobach, " Finding the Achilles Heel of the Web of Data: using network analysis for link-recommendation ", ISWC2010 - To appear
  • 32. {cgueret, swang, schlobac}@few.vu.nl We need to study more!

Editor's Notes

  • #10: All the previous version of the picture are on http://richard.cyganiak.de/2007/10/lod/