SlideShare a Scribd company logo
Using Open Source Tools to Improve Access to Oral History CollectionsBecky Yoose, Bibliographic Systems Librarian and Jody Perkins, Metadata LibrarianMiami University Libraries, Miami University, Oxford, OHMiami Stories OralHistory ProjectBegin in 2005, coordinated by University Archives, CONTENTdm collection maintained by Digital Initiatives

More Related Content

DOCX
Sitkoski Metadata Proposal - Final
PDF
Towards embedded Markup of Learning Resources on the Web
PPTX
Comet project
PPT
Sword Cetis 2007 06 29
PPT
Botanicus.org: Applying ermerging technology to historic scientific literature
PPT
Repository roadshow slides_may2010__final_v2003_sg
PDF
Active Curation of Bi-Text Resources in Commercial Localization Workflows
PDF
Crossref/OASPA Publishers
Sitkoski Metadata Proposal - Final
Towards embedded Markup of Learning Resources on the Web
Comet project
Sword Cetis 2007 06 29
Botanicus.org: Applying ermerging technology to historic scientific literature
Repository roadshow slides_may2010__final_v2003_sg
Active Curation of Bi-Text Resources in Commercial Localization Workflows
Crossref/OASPA Publishers

What's hot (6)

PPTX
Discovery Systems Used in Academic Libraries Projects & Case Study
PPT
Publishing data and code openly
PPTX
Core presentation
PPTX
FAIR Data and Model Management for Systems Biology (and SOPs too!)
PPTX
Metadata and me
PPTX
Discovery Systems Used in Academic Libraries Projects & Case Study
Publishing data and code openly
Core presentation
FAIR Data and Model Management for Systems Biology (and SOPs too!)
Metadata and me
Ad

Viewers also liked (14)

PDF
Your code does not exist in a vacuum
PDF
A tale of two communities
DOC
Bibliographic Data Spring Cleaning with Sierra DNA - Handout
PPT
AutoIt for the rest of us
PPTX
De-identifying Patron Data for Analytics and Intelligence
PDF
2009 04 21 Cv Achten
PDF
But I'm Not A Techie! Technical Tools for Technical Services
PPTX
Semantic search - for journalists
PPTX
Technical Services Tools Redux
PPTX
Bibliographic Data Spring Cleaning with Sierra DNA
PPT
The OpenCalais Workshop at WeMedia 2010
PPT
Harvesting and semantically tagging media releases from political websites us...
PPT
Intro to oc + publisher case studies may 2010
PDF
Semantic Web and Content Strategy
Your code does not exist in a vacuum
A tale of two communities
Bibliographic Data Spring Cleaning with Sierra DNA - Handout
AutoIt for the rest of us
De-identifying Patron Data for Analytics and Intelligence
2009 04 21 Cv Achten
But I'm Not A Techie! Technical Tools for Technical Services
Semantic search - for journalists
Technical Services Tools Redux
Bibliographic Data Spring Cleaning with Sierra DNA
The OpenCalais Workshop at WeMedia 2010
Harvesting and semantically tagging media releases from political websites us...
Intro to oc + publisher case studies may 2010
Semantic Web and Content Strategy
Ad

Similar to Poster: Using Open Source Tools to Improve Access to Oral History Collections (20)

PDF
Open Calais
PDF
Schema.org: What It Means For You and Your Library
PPTX
Digitizing our past
PDF
OpenCalais in Linked Data context
PPT
Using OpenCalais in the context of linked data
PPT
Leeds Met Open Search - towards an integrated solution for research and OER
PPTX
Digital Collection Management with CONTENTdm and Omeka
PDF
Open Library at Make Books Apparent
PDF
Semantically enriching content using OpenCalais
PDF
OpenCalais @ UC Berkeley Media Technology Summit 9/29/09
PPT
THGenius, rdf and open linked data for thesaurus management
PPT
Networked digital library through harvesting
PPTX
Describing Theses and Dissertations Using Schema.org
PPTX
Finding Primary Sources and Digital Collections on the Web
PPTX
Open archives initiatives(final)
PPT
Uk discovery-jisc-project-showcase
PPTX
-Open Archives Initiatives(final)
PPTX
Open archives initiatives(final)
PPTX
Open archives initiatives(final)
PPTX
Open archives initiatives(final)
Open Calais
Schema.org: What It Means For You and Your Library
Digitizing our past
OpenCalais in Linked Data context
Using OpenCalais in the context of linked data
Leeds Met Open Search - towards an integrated solution for research and OER
Digital Collection Management with CONTENTdm and Omeka
Open Library at Make Books Apparent
Semantically enriching content using OpenCalais
OpenCalais @ UC Berkeley Media Technology Summit 9/29/09
THGenius, rdf and open linked data for thesaurus management
Networked digital library through harvesting
Describing Theses and Dissertations Using Schema.org
Finding Primary Sources and Digital Collections on the Web
Open archives initiatives(final)
Uk discovery-jisc-project-showcase
-Open Archives Initiatives(final)
Open archives initiatives(final)
Open archives initiatives(final)
Open archives initiatives(final)

More from Becky Yoose (7)

PPT
Taming the communication beast: Using LibGuides for intra-library communication
PPT
The public side of technical services
PPT
Pack Light: Changes in Technical Services Staffing & Workflow
PDF
Using AutoIt for Millennium Task Automation Handout
PPTX
Using AutoIt for Millennium Task Automation
PDF
Technical Services Tools Redux Handout
PDF
AutoIt for the rest of us - handout
Taming the communication beast: Using LibGuides for intra-library communication
The public side of technical services
Pack Light: Changes in Technical Services Staffing & Workflow
Using AutoIt for Millennium Task Automation Handout
Using AutoIt for Millennium Task Automation
Technical Services Tools Redux Handout
AutoIt for the rest of us - handout

Recently uploaded (20)

PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
cuic standard and advanced reporting.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Encapsulation theory and applications.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
cuic standard and advanced reporting.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools
Spectral efficient network and resource selection model in 5G networks
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf

Poster: Using Open Source Tools to Improve Access to Oral History Collections

  • 1. Using Open Source Tools to Improve Access to Oral History CollectionsBecky Yoose, Bibliographic Systems Librarian and Jody Perkins, Metadata LibrarianMiami University Libraries, Miami University, Oxford, OHMiami Stories OralHistory ProjectBegin in 2005, coordinated by University Archives, CONTENTdm collection maintained by Digital Initiatives
  • 2. Current and former students, faculty, and staff, as well as friends of the University share recollections of their Miami years
  • 3. 100 videotaped interviewsAverage length of interview: 2 hoursHalf have been fully transcribed OpenCalaisReleased in 2008, used by various companies, news agencies, and publishers
  • 4. Uses natural language processing and machine learning to extract categorized metadata (in RDF format) from full text documents
  • 5. API, modules, applications available for different platforms DrupalPopular Open Source content management system (CMS) built with PHP
  • 6. Used widely for web sites and blogs
  • 7. Flexible and customizable, over 8,000 modulesMiami Stories OpenCalais PilotHuman metadata creation workflow: Each interviewer had a cover sheet to list key terms and topics relevant to the interview. These terms were entered as keywords into item records and supplemented with FAST (Faceted Application of Subject Terminology) headings - a controlled vocabulary based on Library of Congress Subject Headings and related LC Authority files.  Human metadata creation issues: Data collected on cover sheets varied with the amount of time and number of staff available for a given interview – the sheets varied from having no data to over 50 topics for a single interview. Name entries even for interviewees were inconsistent. The Libraries did not have the staff to manually go through 60+ interview transcripts to manually extract metadata.Pilot project goal: Experiment with applications that automatically generated index terms from full text as more efficient way to provide access points for this collection at the item level. The OpenCalais API offered a number of advantages that made it ideal for this purpose. The faceting framework it employs makes use of categories that are essential to these kinds of historical collections – names, places and locations in particular.OutcomeOC can provide substantial efficiencies when working with large volumes of full text especially for collections where terms representing people, organizations, facilities and locations that are deemed critical access points.Possible Next Steps:Data quality study: Measure data quality using established criteria for:Aboutness / substantive coverage
  • 13. UsabilityIntegration, display, and sharing: Currently the Oral History Project is hosted on CONTENTdm; however, the Libraries are in the process of migrating several collections to DSpace. In light of this move, the metadata generated from this project, along with the videos, transcripts, and descriptive metadata, might be calling one of the following platforms “home” in the near future:Drupal
  • 14. Omeka http://omeka.org/OpenWMShttp://rucore.libraries.rutgers.edu/open/projects/openwms/ Want to learn more about the technical details of this project? Scan the QR codeor visit http://bit.ly/hYlEHDfor more information!Migrating transcripts into Drupal & batch processing using the Calais Drupal modulein six [oversimplified] easy steps!Step 1: Export XML from CONTENTdm to MySQLStep 2: Import MySQL table using Table WizardStep 3: Migrate content into Drupal using Migrate+Step 4: Edit Calais node settingsNB: We set the Relevancy Threshold to return the maximum number of terms for the project.Step 6: Profit!Step 5: Batch process transcripts using Calais ObservationsOC generates a much larger number of access points, but OC results also included a larger number of false hits/inaccuracies
  • 15. OC categories provide a less granular browsing structure
  • 16. Terms representing contextual and relational information are lacking in OC results
  • 17. Certain aspects of the OC schema don’t suit the content (many irrelevant categories) and there are numerous gaps when compared to the cataloger created metadata
  • 18. Meaning of many OC categories is ambiguous making index terms difficult to interpret
  • 19. Preservation and genre metadata not captured (since OC only processes text)
  • 20. Subject indexing seems to be a weakness of OC – it only generated a few very broad terms, though it did so with a great deal of accuracy
  • 21. Name indexing (people, organizations, facilities and locations) seems to be a real strength of OCFor further informationMiami Stories Oral History Projecthttp://doyle.lib.muohio.edu/cdm4/mustories/OpenCalaishttp://www.opencalais.com/ Drupalhttp://drupal.org/ Becky Yoose, Bibliographic Systems Librarian yoosebj@muohio.eduJody Perkins, Metadata Librarianperkintj@muohio.edu Sample of assigned subjectsSample of name entries issuesCONTENTdm topics Anti-Vietnam War protests
  • 24. Faculty leaving the University
  • 27. Long hair and beards on men
  • 31. Music
  • 36. Technology (false hit)Even though the following table shows some of the name entries issues from OC, it should be noted that OC on average created more name entries than catalogers.