SlideShare a Scribd company logo
The Simple Knowledge Organization System (SKOS) in the context of Semantic Web Deployment Alistair Miles http://purl.org/net/aliman Library of Congress May 2008
THE FUTURE OF THE WEB http://purl.org/net/aliman
Testimony of Sir Timothy Berners-Lee CSAIL Decentralized Information Group Massachusetts Institute of Technology Before the  United States House of Representatives Committee on Energy and Commerce Subcommittee on Telecommunications and the Internet http://dig.csail.mit.edu/2007/03/01-ushouse-future-of-the-web.html http://purl.org/net/aliman
I. Foundations of the Web “ The success of the World Wide Web, itself built on the open Internet, has depended on three critical factors:  unlimited links  from any part of the Web to any other;  open technical standards  as the basis for continued growth of innovation applications; and  separation of network layers , enabling independent innovation for network transport, routing and information applications.” http://purl.org/net/aliman
A. Universal linking: Anyone can connect to anyone... “ In simple terms, the Web has grown because it's  easy to write a Web page and easy to link to other pages.” “ What makes it easy to create links ... is that there is no limit to the number of pages or number of links possible on the Web.” “ Adding a Web page requires  no coordination with any central authority , and has an extremely low, often zero, additional cost.” http://purl.org/net/aliman
“ Adding a page provides content, but  adding a link provide the organization, structure and endorsement to information  on the Web which  turn the content as a whole into something of great value.” http://purl.org/net/aliman
“ The universality and flexibility of the Web's linking  architecture  has a unique capacity to  break down boundaries  of distance, language, and domains of knowledge.” “ These traditional barriers fall away because the cost and complexity of a link is unaffected by most boundaries that divide other media.” http://purl.org/net/aliman
“ The Web's ability to allow people to forge links is why we refer to it as an  abstract information space , rather than simply a network.” http://purl.org/net/aliman
II. Looking forward “ First, the Web will get better and better at helping us to  manage, integrate, and analyze data .” “ Today, the Web is quite effective at helping us to publish and discover documents, but the individual information elements within those documents ... cannot be handled directly as data.” http://purl.org/net/aliman
“ Today you can see the data with your browser, but can't get other computer programs to manipulate or analyze it without going through a lot of manual effort yourself.” “ As this problem is solved, we can expect that Web as a whole to  look more like a large database or spreadsheet , rather than just a set of linked documents.” http://purl.org/net/aliman
A. Data Integration “ Locked within all of this data is  the key to knowledge  about how to cure diseases, create business value, and govern our world more effectively.” “ The good news is that a number of technical innovations... ... (RDF which is to data what HTML is to documents, and the Web Ontology Language (OWL) which allows us to express how data sources connect together) ... ... along with more openness in information sharing practices are  moving the World Wide Web toward what we call the Semantic Web .” http://purl.org/net/aliman
“ Progress toward better data integration will happen through use of the key piece of technology that made the World Wide Web so successful:  the link .”  “ The power of the Web today, including the ability to  find the pages we're looking for , derives from the fact that documents are put on the Web in standard form, and then  linked together .”  http://purl.org/net/aliman
“ The Semantic Web will enable better data integration by allowing everyone who puts individual items of data on the Web to  link them with other pieces of data  using standard formats.” http://purl.org/net/aliman
DATA WEBS FOR E-SCIENCE http://purl.org/net/aliman
http://purl.org/net/aliman
FlyWeb Project Fruit flies ( Drosophila  ...)  Model organism Extensive body of genetic research Much of that knowledge is in journal papers Recognised value of research data Establish public databases E.g. FlyBase Centrally curated http://purl.org/net/aliman
http://purl.org/net/aliman
Data Webs Link data resources Ask questions that no single data resource can answer What’s the easiest, cheapest, most scalable way to achieve this? Agile approach, add value incrementally, return value early and often... http://purl.org/net/aliman
http://purl.org/net/aliman Vertical Web Apps Level 0 – Any Data Resources in the Web Level 1 – SPARQL End-points Level 2 – SPARQL End-points  (Schema Alignment) Level 3 – SPARQL End-points (Integrated Data) Web 2 Mash-ups SPARQL Mash-ups SPARQL Mash-ups ??? Data Web Layer Cake
Example Application [insert screenshot of mashup] http://purl.org/net/aliman
Future, self-publishing As publishing data on the Web becomes easier... ...more research groups will publish their own data... ...rich network of data resources... ...challenging traditional view of scholarly life cycle & value chain ... value grid... http://purl.org/net/aliman
SKOS http://purl.org/net/aliman
Potted History SKOS 2001 (pre-alpha) Thesaurus Interchange Format (TIF), LIMBER Project SKOS 2003 (alpha) Semantic Web Advanced Development for Europe (SWAD-Europe) SKOS 2005 (beta) W3C Semantic Web Best Practices and Deployment Working Group (SWBPD) SKOS 2008 (W3C Recommendation) W3C Semantic Web Deployment Working Group (SWD) http://purl.org/net/aliman
http://purl.org/net/aliman http://www.w3.org/2007/Talks/1211-whit-tbl/
Layers in the Web http://www.w3.org/2007/Talks/1211-whit-tbl/#(23) Third layer is network (graph) of connections beyond documents... ... people, organisations, genes, proteins, concepts ... Represent these connections (data)  in the (Semantic) Web http://purl.org/net/aliman
KOS e.g. LCSH Can be viewed as a network of interconnected concepts Represent LCSH as data in the Web Make those concepts and their interconnections part of the Web http://purl.org/net/aliman
http://purl.org/net/aliman
http://purl.org/net/aliman
http://purl.org/net/aliman
http://purl.org/net/aliman
Publishing KOS in the Web? Use RDF Basic framework for data in the Web – resources, literals, links... (“graphs” of data) http://purl.org/net/aliman
Publishing KOS in the Web? Use SKOS Standard set of... Resource types (Classes) Link types (Properties) ... For representing KOS as RDF data (N.B. Because use URIS as names for classes and properties,  call this an RDF vocabulary) http://purl.org/net/aliman
SKOS Resource Types (Classes) skos:Concept E.g. Baseball in art skos:ConceptScheme E.g. LCSH itself http://purl.org/net/aliman
SKOS Link Types (Properties) For labeling concepts skos:prefLabel, skos:altLabel, skos:hiddenLabel For documenting concepts skos:note, skos:scopeNote, skos:definition, skos:editorialNote... For linking concepts skos:broader, skos:narrower, skos:related http://purl.org/net/aliman
http://inkdroid.org/bzr/lcsh/docs/slides/ http://purl.org/net/aliman
http://purl.org/net/aliman
http://purl.org/net/aliman
Publishing LCSH in the Web Project LCSH into RDF (i.e. create an RDF representation) Publish it in the Web as linked data http://lcsh.info Ed Summers, Clay Redding, Dan Krech, Antoine Isaac http://purl.org/net/aliman
Scope of SKOS SKOS will be an all-encompassing standard for the lossless representation and exchange of all varieties of knowledge organisation system ... ? No http://lists.w3.org/Archives/Public/public-swd-wg/2008Feb/0116.html  -- Antoine Isaac http://purl.org/net/aliman
“ ...the things that we aim at representing are very diverse: some classification schemes use ‘codes’ and refer to ‘classes’, thesauri have ‘terms’ and so on.” http://purl.org/net/aliman
“ Yet, it happens,  looking at the way these things are used  now and will be in the near future (with more and more links established between them), that (i) some standardisation has to take place, and that (ii) this standardisation can be actually grounded on some observed practical similarities ( http://www.w3.org/TR/skos-ucr/ )” “ Our aim is not to replace the original objects in their initial context of use, but to allow to  port them to a shared space , based on a  simplified model , enabling wider re-use and better interoperability.” http://purl.org/net/aliman
Lessons from the Web Less is more ... E.g. REST over SOAP SKOS should capture a small amount of common ground ... Just enough to enable KOS’s valuable concepts and connections to be deployed in the Web and be linked to/from N.B. SKOS is infinitely extensible! Easy to mix & match Easy to refine http://purl.org/net/aliman
THE VALUE OF LINKS http://purl.org/net/aliman
The value of links The Web showed,  links between documents are really useful Google’s pagerank showed,  structure of network means something (and is worth something!) Social networking Web sites showed, how much we value other kinds of links http://purl.org/net/aliman
Linked Metadata You’ve got LCSH in the Web, what next? ...  Linked metadata...? http://purl.org/net/aliman
http://inkdroid.org/bzr/lcsh/docs/slides/ http://purl.org/net/aliman
http://purl.org/net/aliman
http://purl.org/net/aliman
[insert demo, show how links change topology of information space] http://purl.org/net/aliman
http://purl.org/net/aliman
Value Proposition Links are paths to the discovery of information Links can be exploited in useful (and surprising) ways Well-established KOS like LCSH can be  hubs  in the network of linked metadata, bridging ... (On the Semantic Web, LCSH should get  very high semantic pagerank !) http://purl.org/net/aliman
USING URIS http://purl.org/net/aliman
Why use URIs? Identifier management Data discovery http://purl.org/net/aliman
Identifier management Referring to things In a database, each table has a primary key What happens when you try to combine data from 2 databases? Identifier clashes (ambiguous reference) Identifier aliases (co-reference) Clashes hurt precision, give you nonsense Aliases hurt recall, miss important results/links http://purl.org/net/aliman
URIs & identifier management URIs are like a single, global pool of identifiers – one world-wide primary key Can claim ownership of parts of “URI space” Even though we’re all using same primary key, mechanism for avoiding URI clashes Can  join data  from multiple sources with confidence But ... doesn’t solve the alias problem, still need to find co-references http://purl.org/net/aliman
Data discovery Problem with distributed data ... How do you find everything thats “out there”? Two general approaches: Centralised Decentralised http://purl.org/net/aliman
Centralised discovery Someone somewhere keeps a “catalogue” of everything Everyone “knows” where that catalogue is New sources “tell” the catalogue about themselves (a.k.a. “register” themselves) E.g. Gas maintenance Works well at small-medium scales E.g. FlyWeb Rely on networks outside the Web (e.g. Knowing the right people) http://purl.org/net/aliman
FlyWeb Project [small number of large data resources] http://purl.org/net/aliman
Decentralised discovery Data in one source refers data in another (using a URI) Data from the other source can be retrieved directly, by “de-referencing” the URI via the Web So given one data source, you can “follow your nose” and retrieve data from all linked sources ... ... without needing a central catalogue or registry, just the Web Works well up to Web-scale E.g. FOAF http://purl.org/net/aliman
Dereferenceable? For some URIs, can retrieve a “representation” of the “resource” via the Web (N.B. “resource” = “thing”) http://purl.org/net/aliman
FOAF Very large number of relatively small data resources http://purl.org/net/aliman
Why use URIs? Identity management From 2 to 2 billion data sources, always a problem Data discovery Ability to “de-reference” a URI opens possibility for decentralisation Ability to “de-reference” is also useful in centralised models (e.g. Registries can harvest) http://purl.org/net/aliman
SEMANTIC WEB DEPLOYMENT http://purl.org/net/aliman
W3C SWD WG SKOS RDFa Recipes for publishing RDF (linked data) Vocabulary management http://purl.org/net/aliman
W3C Semantic Web Activity Semantic Web Deployment Data Access (DAWG) SPARQL query language, SPARQL protocol GRDDL OWL 2 SWHCLSIG SWEO http://purl.org/net/aliman
SUMMARY http://purl.org/net/aliman
Suggestions Linked KOS Project LCSH into RDF (SKOS) – done  Publish LCSH as linked data in the Web – done Publish SPARQL endpoint for LCSH Linked metadata Project LOC metadata into RDF Publish LOC metadata as linked data in the Web With links to LCSH & LCC Publish SPARQL endpoint for LOC metadata http://purl.org/net/aliman
Bibliographic Information as RDF? Projecting LCSH into RDF ... SKOS is standard vocabulary of resource & link types Projecting LCSH metadata into RDF ... Which vocabulary to use??? Challenge – diversity of bibliographic information! http://purl.org/net/aliman
RDA -> RDF Joint DCMI/RDA task force Seed funding to develop initial prototype RDF vocabularies for bibliographic information Based on FRBR and data model implicit in RDA Early stages http://dublincore.org/dcmirdataskgroup/ Karen Coyle http://purl.org/net/aliman
Thanks STFC Rutherford Appleton Lab Brian Matthews, Michael Wilson, Juan Bicarregui Oxford Image Bioinformatics Research Group David Shotton, Graham Klyne, Jun Zhao W3C Semantic Web Deployment WG Members of  [email_address] Comments on SKOS ->  [email_address] http://purl.org/net/aliman

More Related Content

PPTX
Knowledge organization
PDF
Knowledge Organization Systems
PPTX
Knowledge Organization Systems (KOS): Management of Classification Systems in...
PDF
Linking Folksonomies to Knowledge Organization Systems
PPT
Modelling Knowledge Organization Systems and Structures
PPT
Object models and object representation
PPT
Folksonomies: a bottom-up social categorization system
PPTX
Taxonomy, ontology, folksonomies & SKOS.
Knowledge organization
Knowledge Organization Systems
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Linking Folksonomies to Knowledge Organization Systems
Modelling Knowledge Organization Systems and Structures
Object models and object representation
Folksonomies: a bottom-up social categorization system
Taxonomy, ontology, folksonomies & SKOS.

What's hot (20)

ODP
Building a Digital Library
PDF
Using linguistic analysis to translate
PPTX
Linked Data MLA 2015
PPTX
Linked data MLA 2015
PPTX
Multilingual Knowledge Organization Systems Management: Best Practices
ODP
Wikipedia as source of collaboratively created Knowledge Organization Systems
PDF
Website Performance at Client Level
PPTX
Knowledge Organization System (KOS) for biodiversity information resources, G...
PPTX
2015 07-tuto3-mining hin
PPTX
Linked data HHS 2015
PPT
Corrib.org - OpenSource and Research
PPTX
Semantic Web Technology and Ontology designing for e-Learning Environments
PDF
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
PPT
Semantic Technolgy
PDF
Role of Ontologies in Semantic Digital Libraries
PDF
The Semantic Web in Digital Libraries: A Literature Review
PPT
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
PDF
Mending the Gap between Library's Electronic and Print Collections in ILS and...
PDF
Charleston 2012 - The Future of Serials in a Linked Data World
PPT
An introduction to OAI-ORE
Building a Digital Library
Using linguistic analysis to translate
Linked Data MLA 2015
Linked data MLA 2015
Multilingual Knowledge Organization Systems Management: Best Practices
Wikipedia as source of collaboratively created Knowledge Organization Systems
Website Performance at Client Level
Knowledge Organization System (KOS) for biodiversity information resources, G...
2015 07-tuto3-mining hin
Linked data HHS 2015
Corrib.org - OpenSource and Research
Semantic Web Technology and Ontology designing for e-Learning Environments
SDA2013 Pundit: Creating, Exploring and Consuming Annotations
Semantic Technolgy
Role of Ontologies in Semantic Digital Libraries
The Semantic Web in Digital Libraries: A Literature Review
Multimedia Data Navigation and the Semantic Web (SemTech 2006)
Mending the Gap between Library's Electronic and Print Collections in ILS and...
Charleston 2012 - The Future of Serials in a Linked Data World
An introduction to OAI-ORE
Ad

Viewers also liked (20)

PDF
PPTX
Jack's bird powerpoint
PDF
SKOS hands-on workshop (tutorial) by Regine Stein
PDF
SKOS in a nutshell
PPTX
PoolParty 4 - From Text Mining to Linked Data
PPT
Linked Data and SKOS
PPT
Europeana DSI - LT-Accelerate 14
PDF
Finding media illustrating events
PDF
Implementing the Media Fragments URI Specification
PPT
Networked books and networked reading
PPT
Multilingual challenges in Europeana
PPT
Challenges for the Language Technology Industry
PPT
Data modelling at Europeana and DM2E - SMW13
PPTX
NISO Annual Report of 2012 Activities
PPT
Europeana and the relevance of the DM2E results
PPT
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
PPT
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
PPT
ALA2009_Andy Weissberg (Bowker)
PPT
Europeana and RDF data validation
PDF
NISO's Standards Update & Annual Membership Meeting
Jack's bird powerpoint
SKOS hands-on workshop (tutorial) by Regine Stein
SKOS in a nutshell
PoolParty 4 - From Text Mining to Linked Data
Linked Data and SKOS
Europeana DSI - LT-Accelerate 14
Finding media illustrating events
Implementing the Media Fragments URI Specification
Networked books and networked reading
Multilingual challenges in Europeana
Challenges for the Language Technology Industry
Data modelling at Europeana and DM2E - SMW13
NISO Annual Report of 2012 Activities
Europeana and the relevance of the DM2E results
Wikidata, a target for Europeana's semantic strategy - GLAM-WIKI 2015
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
ALA2009_Andy Weissberg (Bowker)
Europeana and RDF data validation
NISO's Standards Update & Annual Membership Meeting
Ad

Similar to Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008 (20)

PPT
The Web and SKOS, ISKO July 2008
PPT
Semantic Web 2.0: Creating Social Semantic Information Spaces
PPSX
Linked Data to Improve the OER Experience
PPT
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
PDF
WebGUI And The Semantic Web
PPT
Metadata and Scotland’s information environment: potential benefits of Web 2.0
PPTX
Linked Data and Locah, UKSG2011
PPTX
Applications of xml, semantic web or linked data in Library/Information Servi...
PPT
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
DOC
Semantic web
PPT
Exploring and using the Semantic Web - SSSW09 tutorial
PPTX
Doing Clever Things with the Semantic Web
PPT
Interlinking Online Communities and Enriching Social Software with the Semant...
PPTX
The Social Semantic Web
PPT
Of Cataloging & Context
ZIP
SemWeb Fundamentals - Info Linking & Layering in Practice
PPT
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
PPT
Lee Iverson - How does the web connect content?
PDF
Semantic Web Nature
PPTX
Library discovery: past, present and some futures
The Web and SKOS, ISKO July 2008
Semantic Web 2.0: Creating Social Semantic Information Spaces
Linked Data to Improve the OER Experience
Open for Business Open Archives, OpenURL, RSS and the Dublin Core
WebGUI And The Semantic Web
Metadata and Scotland’s information environment: potential benefits of Web 2.0
Linked Data and Locah, UKSG2011
Applications of xml, semantic web or linked data in Library/Information Servi...
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Semantic web
Exploring and using the Semantic Web - SSSW09 tutorial
Doing Clever Things with the Semantic Web
Interlinking Online Communities and Enriching Social Software with the Semant...
The Social Semantic Web
Of Cataloging & Context
SemWeb Fundamentals - Info Linking & Layering in Practice
Beyond Open Access: Open Data, Web services, and Semantics (the Open Context ...
Lee Iverson - How does the web connect content?
Semantic Web Nature
Library discovery: past, present and some futures

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Machine learning based COVID-19 study performance prediction
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
KodekX | Application Modernization Development
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Machine learning based COVID-19 study performance prediction
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity

Simple Knowledge Organization System (SKOS) in the Context of Semantic Web Deployment, Library of Congress, May 2008

  • 1. The Simple Knowledge Organization System (SKOS) in the context of Semantic Web Deployment Alistair Miles http://purl.org/net/aliman Library of Congress May 2008
  • 2. THE FUTURE OF THE WEB http://purl.org/net/aliman
  • 3. Testimony of Sir Timothy Berners-Lee CSAIL Decentralized Information Group Massachusetts Institute of Technology Before the United States House of Representatives Committee on Energy and Commerce Subcommittee on Telecommunications and the Internet http://dig.csail.mit.edu/2007/03/01-ushouse-future-of-the-web.html http://purl.org/net/aliman
  • 4. I. Foundations of the Web “ The success of the World Wide Web, itself built on the open Internet, has depended on three critical factors: unlimited links from any part of the Web to any other; open technical standards as the basis for continued growth of innovation applications; and separation of network layers , enabling independent innovation for network transport, routing and information applications.” http://purl.org/net/aliman
  • 5. A. Universal linking: Anyone can connect to anyone... “ In simple terms, the Web has grown because it's easy to write a Web page and easy to link to other pages.” “ What makes it easy to create links ... is that there is no limit to the number of pages or number of links possible on the Web.” “ Adding a Web page requires no coordination with any central authority , and has an extremely low, often zero, additional cost.” http://purl.org/net/aliman
  • 6. “ Adding a page provides content, but adding a link provide the organization, structure and endorsement to information on the Web which turn the content as a whole into something of great value.” http://purl.org/net/aliman
  • 7. “ The universality and flexibility of the Web's linking architecture has a unique capacity to break down boundaries of distance, language, and domains of knowledge.” “ These traditional barriers fall away because the cost and complexity of a link is unaffected by most boundaries that divide other media.” http://purl.org/net/aliman
  • 8. “ The Web's ability to allow people to forge links is why we refer to it as an abstract information space , rather than simply a network.” http://purl.org/net/aliman
  • 9. II. Looking forward “ First, the Web will get better and better at helping us to manage, integrate, and analyze data .” “ Today, the Web is quite effective at helping us to publish and discover documents, but the individual information elements within those documents ... cannot be handled directly as data.” http://purl.org/net/aliman
  • 10. “ Today you can see the data with your browser, but can't get other computer programs to manipulate or analyze it without going through a lot of manual effort yourself.” “ As this problem is solved, we can expect that Web as a whole to look more like a large database or spreadsheet , rather than just a set of linked documents.” http://purl.org/net/aliman
  • 11. A. Data Integration “ Locked within all of this data is the key to knowledge about how to cure diseases, create business value, and govern our world more effectively.” “ The good news is that a number of technical innovations... ... (RDF which is to data what HTML is to documents, and the Web Ontology Language (OWL) which allows us to express how data sources connect together) ... ... along with more openness in information sharing practices are moving the World Wide Web toward what we call the Semantic Web .” http://purl.org/net/aliman
  • 12. “ Progress toward better data integration will happen through use of the key piece of technology that made the World Wide Web so successful: the link .” “ The power of the Web today, including the ability to find the pages we're looking for , derives from the fact that documents are put on the Web in standard form, and then linked together .” http://purl.org/net/aliman
  • 13. “ The Semantic Web will enable better data integration by allowing everyone who puts individual items of data on the Web to link them with other pieces of data using standard formats.” http://purl.org/net/aliman
  • 14. DATA WEBS FOR E-SCIENCE http://purl.org/net/aliman
  • 16. FlyWeb Project Fruit flies ( Drosophila ...) Model organism Extensive body of genetic research Much of that knowledge is in journal papers Recognised value of research data Establish public databases E.g. FlyBase Centrally curated http://purl.org/net/aliman
  • 18. Data Webs Link data resources Ask questions that no single data resource can answer What’s the easiest, cheapest, most scalable way to achieve this? Agile approach, add value incrementally, return value early and often... http://purl.org/net/aliman
  • 19. http://purl.org/net/aliman Vertical Web Apps Level 0 – Any Data Resources in the Web Level 1 – SPARQL End-points Level 2 – SPARQL End-points (Schema Alignment) Level 3 – SPARQL End-points (Integrated Data) Web 2 Mash-ups SPARQL Mash-ups SPARQL Mash-ups ??? Data Web Layer Cake
  • 20. Example Application [insert screenshot of mashup] http://purl.org/net/aliman
  • 21. Future, self-publishing As publishing data on the Web becomes easier... ...more research groups will publish their own data... ...rich network of data resources... ...challenging traditional view of scholarly life cycle & value chain ... value grid... http://purl.org/net/aliman
  • 23. Potted History SKOS 2001 (pre-alpha) Thesaurus Interchange Format (TIF), LIMBER Project SKOS 2003 (alpha) Semantic Web Advanced Development for Europe (SWAD-Europe) SKOS 2005 (beta) W3C Semantic Web Best Practices and Deployment Working Group (SWBPD) SKOS 2008 (W3C Recommendation) W3C Semantic Web Deployment Working Group (SWD) http://purl.org/net/aliman
  • 25. Layers in the Web http://www.w3.org/2007/Talks/1211-whit-tbl/#(23) Third layer is network (graph) of connections beyond documents... ... people, organisations, genes, proteins, concepts ... Represent these connections (data) in the (Semantic) Web http://purl.org/net/aliman
  • 26. KOS e.g. LCSH Can be viewed as a network of interconnected concepts Represent LCSH as data in the Web Make those concepts and their interconnections part of the Web http://purl.org/net/aliman
  • 31. Publishing KOS in the Web? Use RDF Basic framework for data in the Web – resources, literals, links... (“graphs” of data) http://purl.org/net/aliman
  • 32. Publishing KOS in the Web? Use SKOS Standard set of... Resource types (Classes) Link types (Properties) ... For representing KOS as RDF data (N.B. Because use URIS as names for classes and properties, call this an RDF vocabulary) http://purl.org/net/aliman
  • 33. SKOS Resource Types (Classes) skos:Concept E.g. Baseball in art skos:ConceptScheme E.g. LCSH itself http://purl.org/net/aliman
  • 34. SKOS Link Types (Properties) For labeling concepts skos:prefLabel, skos:altLabel, skos:hiddenLabel For documenting concepts skos:note, skos:scopeNote, skos:definition, skos:editorialNote... For linking concepts skos:broader, skos:narrower, skos:related http://purl.org/net/aliman
  • 38. Publishing LCSH in the Web Project LCSH into RDF (i.e. create an RDF representation) Publish it in the Web as linked data http://lcsh.info Ed Summers, Clay Redding, Dan Krech, Antoine Isaac http://purl.org/net/aliman
  • 39. Scope of SKOS SKOS will be an all-encompassing standard for the lossless representation and exchange of all varieties of knowledge organisation system ... ? No http://lists.w3.org/Archives/Public/public-swd-wg/2008Feb/0116.html -- Antoine Isaac http://purl.org/net/aliman
  • 40. “ ...the things that we aim at representing are very diverse: some classification schemes use ‘codes’ and refer to ‘classes’, thesauri have ‘terms’ and so on.” http://purl.org/net/aliman
  • 41. “ Yet, it happens, looking at the way these things are used now and will be in the near future (with more and more links established between them), that (i) some standardisation has to take place, and that (ii) this standardisation can be actually grounded on some observed practical similarities ( http://www.w3.org/TR/skos-ucr/ )” “ Our aim is not to replace the original objects in their initial context of use, but to allow to port them to a shared space , based on a simplified model , enabling wider re-use and better interoperability.” http://purl.org/net/aliman
  • 42. Lessons from the Web Less is more ... E.g. REST over SOAP SKOS should capture a small amount of common ground ... Just enough to enable KOS’s valuable concepts and connections to be deployed in the Web and be linked to/from N.B. SKOS is infinitely extensible! Easy to mix & match Easy to refine http://purl.org/net/aliman
  • 43. THE VALUE OF LINKS http://purl.org/net/aliman
  • 44. The value of links The Web showed, links between documents are really useful Google’s pagerank showed, structure of network means something (and is worth something!) Social networking Web sites showed, how much we value other kinds of links http://purl.org/net/aliman
  • 45. Linked Metadata You’ve got LCSH in the Web, what next? ... Linked metadata...? http://purl.org/net/aliman
  • 49. [insert demo, show how links change topology of information space] http://purl.org/net/aliman
  • 51. Value Proposition Links are paths to the discovery of information Links can be exploited in useful (and surprising) ways Well-established KOS like LCSH can be hubs in the network of linked metadata, bridging ... (On the Semantic Web, LCSH should get very high semantic pagerank !) http://purl.org/net/aliman
  • 53. Why use URIs? Identifier management Data discovery http://purl.org/net/aliman
  • 54. Identifier management Referring to things In a database, each table has a primary key What happens when you try to combine data from 2 databases? Identifier clashes (ambiguous reference) Identifier aliases (co-reference) Clashes hurt precision, give you nonsense Aliases hurt recall, miss important results/links http://purl.org/net/aliman
  • 55. URIs & identifier management URIs are like a single, global pool of identifiers – one world-wide primary key Can claim ownership of parts of “URI space” Even though we’re all using same primary key, mechanism for avoiding URI clashes Can join data from multiple sources with confidence But ... doesn’t solve the alias problem, still need to find co-references http://purl.org/net/aliman
  • 56. Data discovery Problem with distributed data ... How do you find everything thats “out there”? Two general approaches: Centralised Decentralised http://purl.org/net/aliman
  • 57. Centralised discovery Someone somewhere keeps a “catalogue” of everything Everyone “knows” where that catalogue is New sources “tell” the catalogue about themselves (a.k.a. “register” themselves) E.g. Gas maintenance Works well at small-medium scales E.g. FlyWeb Rely on networks outside the Web (e.g. Knowing the right people) http://purl.org/net/aliman
  • 58. FlyWeb Project [small number of large data resources] http://purl.org/net/aliman
  • 59. Decentralised discovery Data in one source refers data in another (using a URI) Data from the other source can be retrieved directly, by “de-referencing” the URI via the Web So given one data source, you can “follow your nose” and retrieve data from all linked sources ... ... without needing a central catalogue or registry, just the Web Works well up to Web-scale E.g. FOAF http://purl.org/net/aliman
  • 60. Dereferenceable? For some URIs, can retrieve a “representation” of the “resource” via the Web (N.B. “resource” = “thing”) http://purl.org/net/aliman
  • 61. FOAF Very large number of relatively small data resources http://purl.org/net/aliman
  • 62. Why use URIs? Identity management From 2 to 2 billion data sources, always a problem Data discovery Ability to “de-reference” a URI opens possibility for decentralisation Ability to “de-reference” is also useful in centralised models (e.g. Registries can harvest) http://purl.org/net/aliman
  • 63. SEMANTIC WEB DEPLOYMENT http://purl.org/net/aliman
  • 64. W3C SWD WG SKOS RDFa Recipes for publishing RDF (linked data) Vocabulary management http://purl.org/net/aliman
  • 65. W3C Semantic Web Activity Semantic Web Deployment Data Access (DAWG) SPARQL query language, SPARQL protocol GRDDL OWL 2 SWHCLSIG SWEO http://purl.org/net/aliman
  • 67. Suggestions Linked KOS Project LCSH into RDF (SKOS) – done Publish LCSH as linked data in the Web – done Publish SPARQL endpoint for LCSH Linked metadata Project LOC metadata into RDF Publish LOC metadata as linked data in the Web With links to LCSH & LCC Publish SPARQL endpoint for LOC metadata http://purl.org/net/aliman
  • 68. Bibliographic Information as RDF? Projecting LCSH into RDF ... SKOS is standard vocabulary of resource & link types Projecting LCSH metadata into RDF ... Which vocabulary to use??? Challenge – diversity of bibliographic information! http://purl.org/net/aliman
  • 69. RDA -> RDF Joint DCMI/RDA task force Seed funding to develop initial prototype RDF vocabularies for bibliographic information Based on FRBR and data model implicit in RDA Early stages http://dublincore.org/dcmirdataskgroup/ Karen Coyle http://purl.org/net/aliman
  • 70. Thanks STFC Rutherford Appleton Lab Brian Matthews, Michael Wilson, Juan Bicarregui Oxford Image Bioinformatics Research Group David Shotton, Graham Klyne, Jun Zhao W3C Semantic Web Deployment WG Members of [email_address] Comments on SKOS -> [email_address] http://purl.org/net/aliman

Editor's Notes

  • #2: Let me start by saying that I am here to promote SKOS. Having said that, I would like to be as objectives as I can about both the technical and the business arguments for using SKOS, and for buying in to the Semantic Web technology ecosystem. I’d like to do that because I’d like to use this presentation as a kind of sanity check. Libraries such as LOC who have invested over a long period of time in knowledge organisation have always been acknowledged as primary stakeholders in the development of SKOS, and SKOS has always been about providing a means for libraries to extract and to share more value from their knowledge organisation systems. So I’d very much like to know whether what I say today makes sense from your point of view. This is especially relevant as SKOS nears completion. The main technical work on SKOS will effectively be complete by the end of June.