Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web Dr. Barbara B. Tillett Chief, Policy & Standards Division  Library of Congress For ELAG, May 2011
DBpedia Linked Data  National Library of Sweden LCSH VIAF
Internet  “ Cloud” Databases,  Repositories Web front end Services 3
Internet “Cloud” Web front end Services VIAF Databases,  Repositories LCSH 4
VIAF Objectives Facilitate exposure of authority data Reduce cataloging costs Simplify authority control (creation and maintenance) internationally Provide authority data in form, language, and script users want
VIAF 6 歌   川 ,  広重 2 世  1826-1869 ‎      Utagawa, Hiroshige, 1826?-1869
VIAF: The Virtual International Authority File Original VIAF partners Library of Congress  (LC) Deutsche  Nationalbibliothek  (DNB) Bibliothèque   nationale  de France  (BnF) OCLC  - host Virtually combining the name authority files of all institutions into a single name authority service.  http:// viaf.org /
Virtual International Authority File Matches names across 21 authority files of 18 institutions 18.4 million name records 14.5 million clusters Based on KSY Cooperative Identities Hub, CEAL 2010-03
  Library of Congress/ NACO   Deutsche Nationalbibliothek    Bibliothèque nationale de France  National Library of Australia    National Library of the Czech Republic    Bibliotheca Alexandrina (Egypt)      Getty Research Institute  National Library of Israel    Istituto Centrale per il Catalogo Unico (Italy)    Biblioteca National de Portugal    Biblioteca Nacional de España    National Library of Sweden    Swiss National Library    Vatican Library    NUKAT Center (Poland)    Library and Archives Canada    National Széchényi Library (Hungary)  RERO (Switzerland)
Current Status Available as linked data with URIs  (Universal Resource Identifiers) Unicode throughout MARC 21, UNIMARC, and RDF supported Usage tripled this last year Thousands of visits daily
Enhancing the Authorities Bibliographic Record Derived  Authority Authority Record Enhanced Authority 11
Mining the Bibliographic Record LDR  00638ncm a22002057a 450 1 5773347  5 19960820101947.4 8 960815s1965  oruuua  n  eng  10  $a  96753638  040  $a DLC $c DLC 019  $a 17706440 020  $c $2.95 028 22 $a 48418 $b Matrix Publ. Co.  045 2  $b d198006 $b d198007 048  $b va01 $b ve01 $a ka01 050 00 $a M1258 $b .L 100 1  $a Leigh, Mitch, $d 1928- 245 14 $a The man of La Mancha / $c by Mitch Leigh & Joe Darion; arr. By Roland Barrett & Alan Keown. 260  $a Springfield, OR : $b Matrix Publ. Co., $c c1965. 300  $a 1 score (16 p.) ; $c 18 x 27 cm. 500  $a Brief record. 650  0 $a Musicals $x Excerpts. 600 10 $a Leigh, Mitch $x Musical settings. 700 1  $a Darion, Joe. Title Date of Publication Authors LC Control Number LC Classification Material Type Publisher Place of Publication Language Usage
Derived Authority Record 00505cz  a2200157n  450 0  1 xlc 1 1  3 OCoLC 2  5 19880921165012.4 3  8 880831n|acannaab|n aaa c 4 040  $a OCoLC $b eng $c OCoLC $f viaf 5 100 1  $a Leigh, Mitch. 6 903  $a 88030979 7 910 14 $a the man of la mancha 8 921  $a matrix publ co 9 922  $a oru 10 930  $a mitch leigh 11 940  $a eng 12 942  $a 234 13 943  $a 196x 14 944  $a cm 15 950 1  $a darian, joe $d 1928- All text is normalized Subjects are grouped into broad subject areas Material type is coded Publication date is by decade Coauthor
Enhanced Authority Record 00505cz  a2200157n  450 0  1 oca01144962  1  5  19880921165012.4 2  8 840702n| acannaab|  |n aaa |||  3  10  $a n  88090379  4  40  $a DLC $c DLC $d DLC 5 100 1  $a Leigh, Mitch, $d 1928- 6 670  $a  the man of la mancha , c1966: $b t.p. (Mitch Leigh) 7 903  $a 84758340 $9 1 8 903  $a 93710923 $9 1 9 910 11 $a  impossible dream   $9 1 10 910 11 $a century library of music and sound by mitch leigh $9 1 11 921  $a matrix publ co $9 1 12 921  $a kapp $9 2 13 922  $a oru $9 2 14 930  $a mitch leigh $9 1 15 940  $a eng $9 2 16 942  $a 234 $9 2 17 943  $a 196x $9 1 18 943  $a 197x $9 1 19 944  $a cm $9 2 20 950 11 $a darian, joe $d 1928- $9 1 21 950 11 $a wasserman, dale $9 1
Information in Bibliographic Records He writes music His primary subject area is music He was published in the 1960s and 1970s by Matrix Publ. Co. in Oregon and Kapp in New York Worked with Joe Darion and Dale Wasserman Mitch Leigh is the only name he has used on his publications Etc.
http:// www.viaf.org Hosted by
viaf.org
Cervantes Saavedra, Miguel de 1547 Cervantes de Salazar, Francisco, ca. 1514 Cervantes, 1823-1898 Cervantes Juan, 1395-1458 Cervantes, Ignacio, 1847-1905 Cervantes, Juan de, 1382-1453 Cervantès, François, 1959- Cervani, Giulio, 1919- Cervantes, María Antonieta Cervantes de Haro, fl. 1908-193- As viewed Nov. 1, 2010 cer
Cervantes
Cervantes
Cervantes Preferred Forms
Cervantes
Cervantes
Cervantes
Cervantes
Cervantes
Cervantes
MARC 21 Cervantes
RDF Cervantes
VIAF and Catalogers   Use as a reference tool:   To resolve conflicts, questionable dates, forms of name, etc. Cite as source in 670 $a, for example: BNF in VIAF, date searched Nat. Lib. of Australia in VIAF, date searched LAC in VIAF, date searched
Next steps for VIAF Better searching More “Linked data” Related persons as in WorldCat Identities, Wikipedia, etc. Participants beyond libraries Rights management agencies, Publishers Museums, Archives More name types Corporate and Family names Uniform titles Geographic names …  not topical terms
SKOS Simple Knowledge Organization System “ Provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary”— SKOS  Primer
SKOS Based on the Resource Description Framework (RDF) Resources can be exchanged between software applications and published on the Web Interconnects data on the Web, helping create the Semantic Web
id.loc.gov/authorities “Authorities & Vocabularies” from the Library of Congress Intent:  To provide human and programmatic access to commonly found standards and vocabularies developed by LC
“Authorities & Vocabularies” LCSH  was the first offering Subject headings Genre/form headings Children’s subject headings Subdivision records Validation records Provides links from  LCSH  headings to RAMEAU headings Exploring Répertoire de vedettes-matière (RVM) and others
“Authorities & Vocabularies” Also includes: Thesaurus for Graphic Materials  ( TGM ) MARC geographic area codes MARC language codes MARC relator codes Preservation Events … etc.
“Authorities & Vocabularies” Benefits Servers can download entire controlled vocabularies and the values within them, in multiple formats Available for free on the Web
“Authorities & Vocabularies” Human end-users can  Search  and  view  individual headings and data elements   Details of the record Visualization Suggest  additions, changes
Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)
Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)
Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)
URI for specific LCSH records/ concepts: id.loc.gov/authorities/[LCCN] id.loc.gov/authorities/sh8508803 “Authorities & Vocabularies”
Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)
Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)
Contact information Content of site:  Libby Dechman, edec@loc.gov Technical questions:  Larry Dixson, ldix@loc.gov “Authorities & Vocabularies”
A comment form and discussion list are available at   “Authorities & Vocabularies” http://id.loc.gov/authorities/contact.html
RDA Controlled Vocabularies - Registries Free on the Web at Open Metadata Registry http://metadataregistry.org/schema/list.html
http://metadataregistry.org/rdabrowse.htm
Carrier type
URI
RDA Carrier Types URI
RDA Linked Data Don Quixote Madrid, 1979 English Spanish French German Cervantes Library of Congress Copy 1 Green leather binding Exemplary novels Wasserman The Man of La Mancha Text Movies … Derivative works Subject created created created
RDA Linked Terms for Languages Don Quijote Madrid, 1979 Inglés Español Francés Alemán Cervantes Library of Congress Copia 1 Encuadernación en piel color verde Novelas Ejemplares   Wasserman The Man of La Mancha Texto Películas … Obras  derivadas Materias
Internet “Cloud” Web front end Services VIAF Databases,  Repositories LCSH

More Related Content

PDF
Graph Databases: insight, scandal and the speed you always wanted!
PDF
Mon norton tut_publishing01
PDF
RDA and Linked Data. Gordon Dunsire
PPTX
MFIG on MARC21rdf
PPTX
Urban Archaeology Session 8: Add-on - Genealogy and Family History
PPT
Best Genealogy Websites of 2012 - Part 1
PPTX
Karen Coyle: New Models of Matadata
Graph Databases: insight, scandal and the speed you always wanted!
Mon norton tut_publishing01
RDA and Linked Data. Gordon Dunsire
MFIG on MARC21rdf
Urban Archaeology Session 8: Add-on - Genealogy and Family History
Best Genealogy Websites of 2012 - Part 1
Karen Coyle: New Models of Matadata

What's hot (7)

PPTX
Name That Graph !
PPTX
AACR2 to RDA: Using the RDA Toolkit
PDF
An introduction to Semantic Web and Linked Data
PDF
Linked Data - Radical Change?
ODP
Publishing and interlinking music-related data on the Web
PDF
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
PPTX
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Name That Graph !
AACR2 to RDA: Using the RDA Toolkit
An introduction to Semantic Web and Linked Data
Linked Data - Radical Change?
Publishing and interlinking music-related data on the Web
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Ad

Similar to Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett) (20)

PPTX
Creating Narrative with Digital Objects
PPTX
Metadata: a library perspective
PPTX
Beyond MARC: MARC, linked data, and Bibframe
PDF
Digital Narratives for Transylvania DH
PPT
Cni2012
PDF
The Virtual International Authority File
PPTX
Exds spanish composition. kempen
PDF
W3c app ld-asun(v5)-final
PPTX
MARC and BIBFRAME
PDF
Schema.org - Extending Benefits
PPTX
Open-Access Publishing and Geo-Spatial Tools for (Music) Research
PPTX
2013 RBMS Premodern manuscript application profile presentation
PPTX
UVA MDST 3703 Thematic Research Collections 2012-09-18
PPT
Knowledge Discovery in an Agents Environment
PPTX
Libraries and Linked Data: Looking to the Future (1)
PDF
Schema.org - An Extending Influence
PPTX
BIBFRAME and Moving Away From MARC
PPT
Marc 21 Session 7
PPTX
Keynote: Two years at the British Library... and counting / Alan Danskin (Bri...
PDF
The Digital Public Library of America: An Overview and Working with the Natio...
Creating Narrative with Digital Objects
Metadata: a library perspective
Beyond MARC: MARC, linked data, and Bibframe
Digital Narratives for Transylvania DH
Cni2012
The Virtual International Authority File
Exds spanish composition. kempen
W3c app ld-asun(v5)-final
MARC and BIBFRAME
Schema.org - Extending Benefits
Open-Access Publishing and Geo-Spatial Tools for (Music) Research
2013 RBMS Premodern manuscript application profile presentation
UVA MDST 3703 Thematic Research Collections 2012-09-18
Knowledge Discovery in an Agents Environment
Libraries and Linked Data: Looking to the Future (1)
Schema.org - An Extending Influence
BIBFRAME and Moving Away From MARC
Marc 21 Session 7
Keynote: Two years at the British Library... and counting / Alan Danskin (Bri...
The Digital Public Library of America: An Overview and Working with the Natio...
Ad

More from Národní technická knihovna (NTK) (20)

PDF
Overlooked Principles of Strategic Management of Research at a National Level...
PDF
Využití bibliometrických ukazatelů v řízení výzkumné instituce (Daniel Münich...
PDF
InCites: Practical Aspects and Effective Use (Evangelia A. E. C. Lipitakis, ...
PDF
Zkušenosti Knihovny Akademie věd ČR (Pavel Míka, AV ČR)
PDF
Bibliometrie v Národní technické knihovně: metody, zkušenosti, mise a vize (J...
PDF
Bibliometrie: přínosy, úskalí (Jiří Jirát, VŠCHT)
PDF
Význam indikátorů v institucionálním hodnocení a financování (Jitka Moravcová...
PDF
Šmankote, co je to NUŠL? (aktualizovaná verze 2014)
PDF
Rešeršní služby v komerčním sektoru (Martin Mlčoch, nezávislý konzultant)
PDF
Speciální informační služby pro zdravotníky v Národní lékařské knihovně (Mgr....
PDF
Rešeršní služby v NK ČR (Mgr. Karolína Košťálová, NK ČR)
PDF
Legislativní rámec rešerší (Mgr. Alena Pavelová, NTK)
PDF
Model rešeršních služeb v NTK (Bc. Drahomíra Dvořáková, NTK)
PPT
Rešeršní služby Bibliografie dějin Českých zemí v Historickém ústavu AV ČR (M...
PDF
Novinky ve vyhledávání Seznam .cz (Otakar Smrž)
PPTX
Co znamená, že Google o nás ví víc než my sami; aneb zaprodáme duši vyhledáva...
PPT
Vyhledávání hudbou: YouTube trochu jinak (Ondřej Voců)
PPTX
Co se skrývá za vyhledáváním v katalogu NTK (Kristýna Busch, Eliška Veselá)
PDF
Overlooked Principles of Strategic Management of Research at a National Level...
Využití bibliometrických ukazatelů v řízení výzkumné instituce (Daniel Münich...
InCites: Practical Aspects and Effective Use (Evangelia A. E. C. Lipitakis, ...
Zkušenosti Knihovny Akademie věd ČR (Pavel Míka, AV ČR)
Bibliometrie v Národní technické knihovně: metody, zkušenosti, mise a vize (J...
Bibliometrie: přínosy, úskalí (Jiří Jirát, VŠCHT)
Význam indikátorů v institucionálním hodnocení a financování (Jitka Moravcová...
Šmankote, co je to NUŠL? (aktualizovaná verze 2014)
Rešeršní služby v komerčním sektoru (Martin Mlčoch, nezávislý konzultant)
Speciální informační služby pro zdravotníky v Národní lékařské knihovně (Mgr....
Rešeršní služby v NK ČR (Mgr. Karolína Košťálová, NK ČR)
Legislativní rámec rešerší (Mgr. Alena Pavelová, NTK)
Model rešeršních služeb v NTK (Bc. Drahomíra Dvořáková, NTK)
Rešeršní služby Bibliografie dějin Českých zemí v Historickém ústavu AV ČR (M...
Novinky ve vyhledávání Seznam .cz (Otakar Smrž)
Co znamená, že Google o nás ví víc než my sami; aneb zaprodáme duši vyhledáva...
Vyhledávání hudbou: YouTube trochu jinak (Ondřej Voců)
Co se skrývá za vyhledáváním v katalogu NTK (Kristýna Busch, Eliška Veselá)

Recently uploaded (20)

PPT
What is a Computer? Input Devices /output devices
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
DOCX
search engine optimization ppt fir known well about this
PDF
Unlock new opportunities with location data.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Zenith AI: Advanced Artificial Intelligence
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PPT
Geologic Time for studying geology for geologist
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
The various Industrial Revolutions .pptx
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
What is a Computer? Input Devices /output devices
observCloud-Native Containerability and monitoring.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Architecture types and enterprise applications.pdf
A comparative study of natural language inference in Swahili using monolingua...
search engine optimization ppt fir known well about this
Unlock new opportunities with location data.pdf
A novel scalable deep ensemble learning framework for big data classification...
Assigned Numbers - 2025 - Bluetooth® Document
Zenith AI: Advanced Artificial Intelligence
Final SEM Unit 1 for mit wpu at pune .pptx
Geologic Time for studying geology for geologist
Group 1 Presentation -Planning and Decision Making .pptx
The various Industrial Revolutions .pptx
Developing a website for English-speaking practice to English as a foreign la...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
O2C Customer Invoices to Receipt V15A.pptx
1 - Historical Antecedents, Social Consideration.pdf
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Univ-Connecticut-ChatGPT-Presentaion.pdf

Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web (Barbara TIllett)

  • 1. Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web Dr. Barbara B. Tillett Chief, Policy & Standards Division Library of Congress For ELAG, May 2011
  • 2. DBpedia Linked Data National Library of Sweden LCSH VIAF
  • 3. Internet “ Cloud” Databases, Repositories Web front end Services 3
  • 4. Internet “Cloud” Web front end Services VIAF Databases, Repositories LCSH 4
  • 5. VIAF Objectives Facilitate exposure of authority data Reduce cataloging costs Simplify authority control (creation and maintenance) internationally Provide authority data in form, language, and script users want
  • 6. VIAF 6 歌 川 , 広重 2 世 1826-1869 ‎   Utagawa, Hiroshige, 1826?-1869
  • 7. VIAF: The Virtual International Authority File Original VIAF partners Library of Congress (LC) Deutsche Nationalbibliothek (DNB) Bibliothèque nationale de France (BnF) OCLC - host Virtually combining the name authority files of all institutions into a single name authority service. http:// viaf.org /
  • 8. Virtual International Authority File Matches names across 21 authority files of 18 institutions 18.4 million name records 14.5 million clusters Based on KSY Cooperative Identities Hub, CEAL 2010-03
  • 9.   Library of Congress/ NACO Deutsche Nationalbibliothek   Bibliothèque nationale de France National Library of Australia   National Library of the Czech Republic   Bibliotheca Alexandrina (Egypt)   Getty Research Institute National Library of Israel   Istituto Centrale per il Catalogo Unico (Italy)   Biblioteca National de Portugal   Biblioteca Nacional de España   National Library of Sweden   Swiss National Library   Vatican Library   NUKAT Center (Poland)   Library and Archives Canada   National Széchényi Library (Hungary) RERO (Switzerland)
  • 10. Current Status Available as linked data with URIs (Universal Resource Identifiers) Unicode throughout MARC 21, UNIMARC, and RDF supported Usage tripled this last year Thousands of visits daily
  • 11. Enhancing the Authorities Bibliographic Record Derived Authority Authority Record Enhanced Authority 11
  • 12. Mining the Bibliographic Record LDR 00638ncm a22002057a 450 1 5773347 5 19960820101947.4 8 960815s1965 oruuua n eng 10 $a 96753638 040 $a DLC $c DLC 019 $a 17706440 020 $c $2.95 028 22 $a 48418 $b Matrix Publ. Co. 045 2 $b d198006 $b d198007 048 $b va01 $b ve01 $a ka01 050 00 $a M1258 $b .L 100 1 $a Leigh, Mitch, $d 1928- 245 14 $a The man of La Mancha / $c by Mitch Leigh & Joe Darion; arr. By Roland Barrett & Alan Keown. 260 $a Springfield, OR : $b Matrix Publ. Co., $c c1965. 300 $a 1 score (16 p.) ; $c 18 x 27 cm. 500 $a Brief record. 650 0 $a Musicals $x Excerpts. 600 10 $a Leigh, Mitch $x Musical settings. 700 1 $a Darion, Joe. Title Date of Publication Authors LC Control Number LC Classification Material Type Publisher Place of Publication Language Usage
  • 13. Derived Authority Record 00505cz a2200157n 450 0 1 xlc 1 1 3 OCoLC 2 5 19880921165012.4 3 8 880831n|acannaab|n aaa c 4 040 $a OCoLC $b eng $c OCoLC $f viaf 5 100 1 $a Leigh, Mitch. 6 903 $a 88030979 7 910 14 $a the man of la mancha 8 921 $a matrix publ co 9 922 $a oru 10 930 $a mitch leigh 11 940 $a eng 12 942 $a 234 13 943 $a 196x 14 944 $a cm 15 950 1 $a darian, joe $d 1928- All text is normalized Subjects are grouped into broad subject areas Material type is coded Publication date is by decade Coauthor
  • 14. Enhanced Authority Record 00505cz a2200157n 450 0 1 oca01144962 1 5 19880921165012.4 2 8 840702n| acannaab| |n aaa ||| 3 10 $a n 88090379 4 40 $a DLC $c DLC $d DLC 5 100 1 $a Leigh, Mitch, $d 1928- 6 670 $a the man of la mancha , c1966: $b t.p. (Mitch Leigh) 7 903 $a 84758340 $9 1 8 903 $a 93710923 $9 1 9 910 11 $a impossible dream $9 1 10 910 11 $a century library of music and sound by mitch leigh $9 1 11 921 $a matrix publ co $9 1 12 921 $a kapp $9 2 13 922 $a oru $9 2 14 930 $a mitch leigh $9 1 15 940 $a eng $9 2 16 942 $a 234 $9 2 17 943 $a 196x $9 1 18 943 $a 197x $9 1 19 944 $a cm $9 2 20 950 11 $a darian, joe $d 1928- $9 1 21 950 11 $a wasserman, dale $9 1
  • 15. Information in Bibliographic Records He writes music His primary subject area is music He was published in the 1960s and 1970s by Matrix Publ. Co. in Oregon and Kapp in New York Worked with Joe Darion and Dale Wasserman Mitch Leigh is the only name he has used on his publications Etc.
  • 18. Cervantes Saavedra, Miguel de 1547 Cervantes de Salazar, Francisco, ca. 1514 Cervantes, 1823-1898 Cervantes Juan, 1395-1458 Cervantes, Ignacio, 1847-1905 Cervantes, Juan de, 1382-1453 Cervantès, François, 1959- Cervani, Giulio, 1919- Cervantes, María Antonieta Cervantes de Haro, fl. 1908-193- As viewed Nov. 1, 2010 cer
  • 30. VIAF and Catalogers Use as a reference tool: To resolve conflicts, questionable dates, forms of name, etc. Cite as source in 670 $a, for example: BNF in VIAF, date searched Nat. Lib. of Australia in VIAF, date searched LAC in VIAF, date searched
  • 31. Next steps for VIAF Better searching More “Linked data” Related persons as in WorldCat Identities, Wikipedia, etc. Participants beyond libraries Rights management agencies, Publishers Museums, Archives More name types Corporate and Family names Uniform titles Geographic names … not topical terms
  • 32. SKOS Simple Knowledge Organization System “ Provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary”— SKOS Primer
  • 33. SKOS Based on the Resource Description Framework (RDF) Resources can be exchanged between software applications and published on the Web Interconnects data on the Web, helping create the Semantic Web
  • 34. id.loc.gov/authorities “Authorities & Vocabularies” from the Library of Congress Intent: To provide human and programmatic access to commonly found standards and vocabularies developed by LC
  • 35. “Authorities & Vocabularies” LCSH was the first offering Subject headings Genre/form headings Children’s subject headings Subdivision records Validation records Provides links from LCSH headings to RAMEAU headings Exploring Répertoire de vedettes-matière (RVM) and others
  • 36. “Authorities & Vocabularies” Also includes: Thesaurus for Graphic Materials ( TGM ) MARC geographic area codes MARC language codes MARC relator codes Preservation Events … etc.
  • 37. “Authorities & Vocabularies” Benefits Servers can download entire controlled vocabularies and the values within them, in multiple formats Available for free on the Web
  • 38. “Authorities & Vocabularies” Human end-users can Search and view individual headings and data elements Details of the record Visualization Suggest additions, changes
  • 42. URI for specific LCSH records/ concepts: id.loc.gov/authorities/[LCCN] id.loc.gov/authorities/sh8508803 “Authorities & Vocabularies”
  • 45. Contact information Content of site: Libby Dechman, edec@loc.gov Technical questions: Larry Dixson, ldix@loc.gov “Authorities & Vocabularies”
  • 46. A comment form and discussion list are available at “Authorities & Vocabularies” http://id.loc.gov/authorities/contact.html
  • 47. RDA Controlled Vocabularies - Registries Free on the Web at Open Metadata Registry http://metadataregistry.org/schema/list.html
  • 50. URI
  • 52. RDA Linked Data Don Quixote Madrid, 1979 English Spanish French German Cervantes Library of Congress Copy 1 Green leather binding Exemplary novels Wasserman The Man of La Mancha Text Movies … Derivative works Subject created created created
  • 53. RDA Linked Terms for Languages Don Quijote Madrid, 1979 Inglés Español Francés Alemán Cervantes Library of Congress Copia 1 Encuadernación en piel color verde Novelas Ejemplares Wasserman The Man of La Mancha Texto Películas … Obras derivadas Materias
  • 54. Internet “Cloud” Web front end Services VIAF Databases, Repositories LCSH

Editor's Notes

  • #2: Building Blocks for the Future: Making Controlled Vocabularies Available for the Semantic Web Abstract: Efforts have been underway for several years to lay some of the building blocks for linked data, cloud computing, and foundations for the Semantic Web. How are libraries contributing? Some of the projects that the Library of Congress has contributed to are described: the Virtual International Authority File, the id.loc.gov site for posting LC’s own controlled vocabularies, like the Library of Congress Subject Headings (LCSH) in SKOS (Simple Knowledge Organization Schema) format; and the controlled vocabularies prepared for the new cataloging code, RDA: Resource Description and Access. Credits: Some of the slides are from Ed O’Neill (on the VIAF algorithms), Thom Hickey (on VIAF statistics), and Karen Smith-Yoshimura (on VIAF images for Chekov ) from OCLC Research – with their permission. Today I’d like to share with you some things we are doing now - that work now on the Web - and that we hope will also be even more useful in the future as systems are designed to take better advantage of the rich data that libraries have provided for centuries through our bibliographic and authority records. We can repackage that data so it can be used in creative ways to better serve our users. Let me start with linked data on the Web.
  • #3: “ Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” Also see: http://richard.cyganiak.de/2007/10/lod/ (Modified notes from Karen Smith-Yoshimura) DBpedia  is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. The idea is that this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and  that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself as well as the linked data, including connections to library data – <click> here showing the link to the database at the National Library of Sweden. This image shows the links as of last July and there are now many more. According to their online site as of September 2010, the DBpedia knowledge base describes more than 2.9 million things , including at least 282,000 persons, 339,000 places (including 241,000 populated places), 88,000 music albums, 44,000 films, 15,000 video games, 119,000 organizations (including 20,000 companies and 29,000 educational institutions), 130,000 species and 4400 diseases. The DBpedia knowledge base has labels and abstracts for these things in  at least 91 different languages; 807,000 links to images and 3,840,000 links to external Web pages; 4,878,100 external links into other RDF datasets, 415,000 Wikipedia categories, and 75,000 YAGO categories.
  • #4: Information systems and content, like digital resources, images on Flickr, and our bibliographic and authority data may all be freely accessible on the Web or available for some nominal fee. It’s now part of the Internet cloud computing environment that we have today with Amazon, Google, and other systems where the elements that describe our resources are available to libraries and users everywhere in the world – not just on an institution’s computer, but shared and available to everyone through the Internet. The data comes from publishers, from the creators of the resources, from trusted libraries and other institutions, and can be augmented by further descriptive information from anyone who wants to help. All of the data about information resources in our bibliographic universe of things we have in our library collections and in archives will be accessible by any user anywhere at anytime.
  • #5: Bibliographic data and digital resources are on the Web now and we’ve started adding the controlled vocabularies to help identify resources – such as the controlled values for naming the types of content (like sound, text, still images, and so on), types of carriers (like a film reel, a computer disc, a volume), and other elements in RDA that have controlled lists of values. Some of the RDA vocabularies are already being registered on the Web and can be used to present displays and show pathways to related resources. So today, I’d like to show you three services or building blocks for this linked Web environment that the Library of Congress is involved in with other libraries: <click> VIAF ( Virtual International Authority File) <click> LCSH – Library of Congress Subject Headings <click> And the controlled vocabularies from the new cataloging code, RDA: Resource Description and Access
  • #6: VIAF, the Virtual International Authority File, was established to provide a free service on the Web to help share the authority data created by libraries around the world. Libraries can use it to help reduce the costs of cataloging, because authority control is one of the most expensive aspects of cataloging operations. VIAF also was designed to make authority control easier on an international scale, where libraries could help each other maintain the data. And longer term, we hope the VIAF data can be used through linked data services to enable displaying our bibliographic data in the form, language, and script that end users want.
  • #7: For example, if a user gets information about Hiroshige Utagawa in Japanese while searching on the Web, but they have set up their profile to see English in our Latin script – a system application could use VIAF to help display the desired script – for searching further in France or in the US, and could move through the Internet to linked available works by or about this artist.
  • #8: The concept of a shared authority file has been discussed within IFLA since the 1970’s as part of IFLA’s goal of “universal bibliographic control”: single heading for each person, corporate body, etc. to be used by everyone in the world. What’s wrong with that idea if you are interested in Hiroshige’s works and the heading is only to be presented in Japanese? Also, the technology wasn’t available in the 1970’s to realize international sharing of authority data. We didn’t have automated system capabilities to support catalogers doing authority work and to do as much of it as possible “automatically.” So, IFLA moved away from the principle of a single heading in response to users’ needs for language and scripts and to accommodate different cataloging codes. Parallel to IFLA’s efforts, the initial VIAF project in 2003 was a partnership between the Library of Congress, the Deutsche Nationalbibliothek (the national library of Germany), and OCLC. Various possible models were considered including those used by other authority file projects: Project LEAF, InterParty Project, and Project AUTHOR, but we decided to use a centralized model with links to different national authority files. OCLC developed algorithms to use information in the bibliographic records and information in authority records to match names in the two authority files. By 2007 the Bibliotheque nationale de France joined the partners.
  • #9: Now, VIAF has 18 participants. Israel is contributing 4 separate authority files for different scripts (Latin script, Hebrew, Cyrillic, and Arabic), so there are 21 authority files with 18.4 million name records in about 14.5 million clusters.
  • #10: In no particular order, these are the current participants. Notice “Library of Congress/NACO” includes the British Library, the National Library of New Zealand, and all of our other NACO partners around the world, so there are actually more institutions represented than just those listed as the participants. The most recent addition are the authority records from RERO in Switzerland. The National Institute for Informatics in Japan, the National Diet Library of Japan, the Russian State Library, as well as the National Library of Slovenia and a consortium in Belgium are also in progress.
  • #11: From Thom Hickey’s blog Sept. 25, 2009: VIAF  (the Virtual International Authority File) is now available as linked data. Linked data means: URIs for everything - universal resource identifiers HTTP 303 redirects for URIs representing the personae our metadata is about  [ 303 See Other (since HTTP/1.1)] HTTP content negotiation for different data formats An RDF view of the data [The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web] A rich a set of internal and external links in our data. The scripts are in Unicode and the data can be submitted as either UNIMARC, MARC 21, or MARCXML. We also learned from Thom Hickey that the usage of VIAF tripled this past year with thousands of visits daily and millions of visits by machines.
  • #12: For every personal name used in either a 100 or a 700 field, a derived authority record is created. In addition to the name, the derived authority record includes a coded summary of the material published. A bibliographic record with multiple personal names will generate multiple derived authority records. All of the derived authority records for a particular person will be clustered with the authority record for the individual. The contents of all the derived authority records for the individual are added to form the enhanced authority record.
  • #13: Two types of information are collected from the bibliographic record associated with the personal names. First, information specific to a resource is collected such as the title and LC Control number. Second, information that may apply to multiple works by an author is obtained. The latter information may help to match authors when specific title matches are not available. The OCLC Algorithms capture and label such data as … (walk through slide)
  • #14: The Derived Authority Record moves the focus of the information in the bibliographic record from the manifestation to the person. This derived record is for Mitch Leigh who wrote the music. A similar derived authority record would also be created for Joe Darion - another composer, and the arrangers Roland Barrett & Alan Keown. Local fields (9xx) are used to store the information derived from the bibliographic record. All of the text (titles, publishers, etc.) is normalized using the NACO normalization rules. All upper case characters are changed to lower case, most diacritics are removed, and most punctuation is dropped. Subjects are entered using the number associated with the subject groups from the National Shelf List. This converts the LC classification into approximately 500 broad subject areas. The decade of publication is used for the publication dates.
  • #15: To build the Enhanced Authority Record, the original Personal Name Authority Record is combined with all of the available derived authority records for the author. Repeated information is counted to increase its relevance in the matching analysis. The $9 subfield shows the frequency of the attribute. For example, it shows that two (in this case, all) of his publications are in English.
  • #16: There is a lot of information that you can mine from the bibliographic and authority data – like the things shown here. Hickey
  • #17: This is the URL for VIAF, and I encourage you all to take a look. How many of you are already using it?
  • #18: Here’s what you see when you go to that Web site.
  • #19: There is a Google- type search box – you have some drop down boxes to select whether you’d like to search by name or by title, another drop down box to select a particular authority file of all of VIAF, and a third search box to type in the name or title you want to search. As you begin typing, the system has a drop-down box of suggested names in the system that match what you are typing. Let’s select Cervantes, and you then retrieve (next screen)
  • #20: The list of the authorized forms of names found in VIAF – attached to each is a little flag for the participating institutions and a sample title on the right to assure you have the one you were looking for. We can scroll further down the screen.
  • #21: And we have the ability to scroll through various cover art associated with the publications by Cervantes.
  • #22: Further down is the complete listing of the preferred forms of name – with the contributing institution’s flag.
  • #23: If you don’t recognize which institution the flag belongs to, you can click on the flag in the circle to the right and up will pop the name as well as the record number for Cervantes in their authority file.
  • #24: Further down the screen we have selected titles.
  • #25: A bit further we have countries of publication –remember from the mined data we captured how many times a particular country was found in the bibliographic records associated with the name – here it is used to show relative volume of publications coming from each of the countries. Isn’t this a lot more interesting than seeing the MARC coded country codes? VIAF shows some ideas of how we can more creatively re-use the data we already have in our records.
  • #26: This is a timeline of publication dates for Cervantes’ books. Each bar on the timeline is a decade.
  • #27: and when you scroll over the bar, you get statistics about how many publications are in that decade as represented in the contributing databases.
  • #28: VIAF also displays information about the person – here showing that Cervantes was a male, nationality – Spain, and VIAF shows the primary language used by this person as seen in various manifestations of his works. And there are some links to Wikipedia and to WorldCat Identities (since it is a product associated with OCLC) and there is the possibility to have other links in the future.
  • #29: VIAF also has the MARC 21, UNIMARC, and RDF formatted “enhanced authority records” for each person.
  • #30: Here’s what the data looks like as RDF (Resource Description Framework).
  • #31: The Policy & Standards Division (PSD) at LC has had several questions on whether catalogers (LC and PCC – Program for Cooperative Cataloging) should be required to search VIAF for non-US authors BEFORE establishing these in the LC/NAF. Currently PCC and LC catalogers are not required to search VIAF before creating a Name Authority record to contribute to NACO, but are encouraged to do so. This may change when VIAF moves out of the beta stage - catalogers certainly are encouraged to use the VIAF to resolve conflicts. We have found that searching VIAF is an efficient mechanism for searching authority files, and LC is encouraging our catalogers and other to use VIAF in order to provide more consistency in authorized access points.
  • #32: See slide – we are not including topical terms, because experience in trying to map vocabularies across thesauri in several projects has shown it is not worth doing – and there are other ways to offer users suggestions from multiple thesauri that are linked to the same bibliographic data – such as through faceted systems. However, LC is making its subject authority data available freely on the Web.
  • #33: We are doing that through the use of SKOS – Simple Knowledge Organization System. In particular we have posted the Library of Congress Subject Headings in SKOS Format on the Web. SKOS is (see slide)
  • #34: See slide
  • #35: See slide
  • #36: See slide
  • #37: See slide
  • #38: See slide
  • #39: See slide
  • #40: Here is the Web site for our “Authorities & Vocabularies” – also known as id.loc.gov
  • #41: Here’s the page you see to search LCSH – notice there is also a tab <click> to enable anyone to suggest subject terminology or alert LC to any problems.
  • #42: Notice when I searched for animated films, I get back 3 possiblities –each with a different LC Control number – the LCCNs – sh (LCSH – topical heading) sj (Children’s Subject Headings) g/f (Genre/Form)
  • #43: See slide
  • #44: Note each element in the MARC subject heading record is displayed in a more user-friendly form as portions of this textual display – again converting the MARC records to display data.
  • #45: We also offer the ability to visualize the LCSH terms using AquaBrowser – Blue = narrower terms Green = broader terms Pink = related terms and you can click on these and move them around to explore the relationships.
  • #46: Our contacts (see slide)
  • #47: See slide
  • #48: The last controlled vocabulary I’ll talk about today, is the various terms used in descriptive cataloging and found in the new cataloging code RDA: Resource Description and Access. This new cataloguing code replaces AACR2 and has been developed as a Web product – the publishers call the cataloging code plus the complementary tools, the “RDA Toolkit.” RDA was developed with the Web in mind. It gives instructions for identifying all of the things in our bibliographic universe, to use as we describe the things in our collections, the library collections, the associated persons, families, corporate bodies, concepts, objects, events, and places. The elements and sometimes specific values are included as controlled vocabularies, like the list of names of languages – English, French, Spanish, etc. (for us in the US, we take the list of languages from the MARC format), or the list of types of carriers, like computer discs, audio files, and so on.
  • #49: The controlled vocabularies and the list of elements included in RDA are posted on a Web registry run by the Open Metadata Registry – as shown here.
  • #50: Here’s what the registry looks like when you search the basic elements – things like the carrier type. These basic elements in RDA are the various identifying characteristics covered in the cataloging instructions, and
  • #51: for each element, there is a URI – Universal Resource Identifier, which can be used in descriptions instead of the text string. This is helpful for displaying the language and script a user wants to see – because the URI number can be converted into a display of a text string in English or German, or whatever language. For now it’s only English and German. Spanish and French are coming soon. The Deutsche Nationalbibliothek is providing the German equivalent terms for the RDA element sets registered on the Web, and the same URI identifying Carrier type can be displayed in German to suit the application or user profile.
  • #52: Going the next step beyond just identifying the element, to identifying the specific values or types or terms that can be used when identifying that element – for example, here we see the first page of terms that are carrier types – the green checkmarks are the broad categories and more specific terms are below each. In addition to the RDA registry, we will get the MARC list of languages from the id.loc.gov “Authorities & Vocabularies” from the Library of Congress.
  • #53: The vocabulary for the controlled names of languages, used in RDA to identify the expression, can be displayed using a linked URI from data in the bibliographic or authority records – and depending on the user’s view – it can be displayed in English as shown here, or any other language, such as
  • #54: in Spanish. By having the elements identified and the various language equivalents registered, the displays can be appropriate to the context – a user’s profile can alert the system which language/script should be displayed. I look forward to new systems that will help users navigate the library collections of the world as part of searching the Web. RDA is another of the building blocks to provide the identifying characteristics of the things in our bibliographic universe and to provide the relationships among those things so we can offer displays and pathways for users to explore our collections and the resources available to them worldwide -- so they can find what they need in global Web-based systems – in whatever language or script they want.
  • #55: Google and Yahoo work in a cloud computing environment with linked data and a global perspective, and now libraries are finally getting into the act. I hope to see really creative services emerge that take advantage of the building blocks libraries provide through our controlled vocabularies and bibliographic data, so users can connect to our collections. Thank you for your attention.