SlideShare a Scribd company logo
Improving Access to Special Collections
by Automating Descriptive
Metadata Creation
Anna Neatrour, Metadata Librarian
Jeremy Myntti, Head of Digital Library Services
Betsey Welland, Manuscripts Archivist
Jessica Breiman, AV Archivist
#ula2016
Covering today
Background information about Special
Collections and Digital Library Services
Overview of ways we are processing data:
• Extracting data from finding aids for
item level description
in digital collections
• Extracting and Analyzing
Names/Subjects in EAD
• EAD to MARC record transform
through MARCEdit
• Techniques for working with legacy
data
Resource page where you can download
examples
P0790 Shipler Studio Photograph Collection
Special Collections at Marriott Library
• 6 departments: Print & Journal, Rare
Books, Manuscripts, Photos, AV,
University Records
• 4 depts. produce finding aids; each
department produces own finding aids
even if items come from same donor
• Finding aids published in Archives West;
4,000 and counting
• Onsite storage; storage in Automated
Retrieval Center; and offsite building
P0244 Olive Woolley Burton Photograph Collection
Special Collections at Marriott Library
• Despite the size of our collections, we have
no Archival Management System
• Finding aids encoded by hand in XML
• Former process: EADs were printed out and
hand-delivered to librarians in cataloging
• Doesn’t make sense for metadata librarians
to recreate item-level work that has already
been done.
• Sought help of XSLT and MARCEdit
wizards! University of Utah Archival Photograph Collection;
Departments -- Computer Science
Technical background
• XSLT - Extensible Stylesheet Language Transformation - Used for
transforming XML documents into other formats
• Excel formulas - allow you to extract and concatenate data in a variety of
ways
• Low cost/free tools like Oxygen and MarcEdit
• Examples, tips, and learning resources
at the end of the presentation!
P0206 Rocky Mountain Power and Light Company
Photograph Collection
Extracting descriptive metadata
from EAD Finding Aids
P0790 Shipler Studio Photograph
Collection
Finding Aids
Container area of EAD = item level metadata
Structure of EAD makes it easy to extract data
XSLT (Extensible Stylesheet Language
Transformations)
Using Oxygen to extract metadata
Currently we process this file
with formulas in Excel
We could rewrite the XSLT
to handle this sort of
transformation too
Excel for additional processing
See our sample spreadsheet for formulas you can repurpose!
Extracting and Analyzing
Names/Subjects in EAD
P0016 Pro-Utah, Inc. Photograph Collection
XSLT to extract data
Create spreadsheets of the data
Deduplicate to see most commonly used values
Reconcile using OpenRefine
Uses for this type of data
● Identify NACO work to be completed
● Find inconsistencies in name/subject usage
● Identify typos or other problems to fix
P0413 Alan K. Engen Photograph Collection
EAD to MARC using
XSLT and MarcEdit
P0206 Rocky Mountain Power and Light Company
Photograph Collection
Generating MARC records from EAD
Since we don’t have an archival management system to handle this
automatically, this is a functional workaround.
Take EAD, run transform in MarcEdit with local stylesheet, edit the resulting
draft MARC record to meet local standards.
XSLT to transform XML to MARC
Generating MARC records from EAD
Use MARC Tools to load custom XSLT
Generate MARC record for editing
Working with Legacy Data
P0305 University of Utah Archives Photograph
Collection – A-Fa. -- Thomas Stockham
Working with legacy data
• Improves access to collections
• Gives archivists more accurate data on items/formats in the collection
• Helps archivists assess which items most in need of preservation/digitization
University of Utah Archival Photograph Collection; Departments -- Computer Science
AV Archives Legacy Metadata (the “before” shot)
● Hundreds of documents
● Still in WordPerfect format
● Somewhat structured
A little better with Notepad++
Even better with Excel...
Contents List in EAD
Item level, no series, boxes, etc.
Conclusion
• Link to resource site:
https://sites.google.com/site/specialcollectionsmetadat
• Repurposing data can help streamline processes and
speed up descriptive metadata creation.
• Eliminate time spent reformatting or copying and pasting
information.
• Doesn’t require a great deal of technical background to
implement these solutions.
• If you are doing a great deal of copying and pasting
there is probably an easier, more efficient way of doing
this work.
• Google your problem, you will be surprised at the helpful
resources you can find.
• http://www.libraryworkflowexchange.org/ - collects
resources in this area.
P0244 Olive Woolley Burton Photograph Collection

More Related Content

PPTX
Introduction to ms access database
PPTX
The path to flexible loading of patron records
PPTX
Using drill down within alma analytics reports
PPTX
Discovery layer decisions, configurations and strategies
PPTX
Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...
PPT
Entities and attributes
PPTX
Working with SPSS
PPT
Data Dictionary
Introduction to ms access database
The path to flexible loading of patron records
Using drill down within alma analytics reports
Discovery layer decisions, configurations and strategies
Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...
Entities and attributes
Working with SPSS
Data Dictionary

What's hot (20)

PDF
4-Managing CrossRef DOIs
PPT
Databases
PPTX
CSAII - 6.01 Research Report - Pietras
PPT
Breaking the Waves: Implementing Coral at UW-Parkside
PPTX
Excel accessibility
PDF
Overview of the preparation of tables in the new CountrySTAT platform
 
PPT
Sql Server 2005 Business Inteligence
PPTX
Turning the Corner at High Speed: How Collections Metrics Are Changing in a H...
PPTX
Beyond COUNTER Compliant: Ways to Assess E-Resources Reporting Tools
PPTX
CrossRef - Global publishing panel 2012 (edilson damasio)
PPTX
Hm306 week 2
PPTX
LaTeX for B.Sc. Mathematics,an introduction
PPTX
" Overview of the Metadata in the new CountrySTAT platform "
 
PDF
Ibi accessing and preparing data
PPTX
Collection level cooperative cataloging --a plea for catalogers to add k bart...
PPTX
TABLEAU ONLINE TRAINING | TABLEAU E TRAINING
PPTX
The strength of a spatial database
PDF
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
PPTX
RDF Graph Data Management in Oracle Database and NoSQL Platforms
4-Managing CrossRef DOIs
Databases
CSAII - 6.01 Research Report - Pietras
Breaking the Waves: Implementing Coral at UW-Parkside
Excel accessibility
Overview of the preparation of tables in the new CountrySTAT platform
 
Sql Server 2005 Business Inteligence
Turning the Corner at High Speed: How Collections Metrics Are Changing in a H...
Beyond COUNTER Compliant: Ways to Assess E-Resources Reporting Tools
CrossRef - Global publishing panel 2012 (edilson damasio)
Hm306 week 2
LaTeX for B.Sc. Mathematics,an introduction
" Overview of the Metadata in the new CountrySTAT platform "
 
Ibi accessing and preparing data
Collection level cooperative cataloging --a plea for catalogers to add k bart...
TABLEAU ONLINE TRAINING | TABLEAU E TRAINING
The strength of a spatial database
"Using Indexes in SQL Server 2008" by Alexander Korotkiy, part 1
RDF Graph Data Management in Oracle Database and NoSQL Platforms
Ad

Viewers also liked (13)

PPT
Revolutionizing the hypatia metadata experience
PPT
Metadata for your Digital Collections
PPTX
LIS457 - metadata for beginners - kaile glick
PPT
Learning Centres How Tos
PPT
Preservation metadata
PPTX
It's All About the Metadata
PDF
What is Metadata?
PPTX
LIS 653, Session 4-B: Introduction to Descriptive Metadata
PDF
Metadata in Business Intelligence
PPT
Metadata an overview
PDF
Metadata Workshop
PDF
Document Management in SharePoint without folders - Introduction to Metadata
PDF
Introduction to metadata management
Revolutionizing the hypatia metadata experience
Metadata for your Digital Collections
LIS457 - metadata for beginners - kaile glick
Learning Centres How Tos
Preservation metadata
It's All About the Metadata
What is Metadata?
LIS 653, Session 4-B: Introduction to Descriptive Metadata
Metadata in Business Intelligence
Metadata an overview
Metadata Workshop
Document Management in SharePoint without folders - Introduction to Metadata
Introduction to metadata management
Ad

Similar to Improving access to special collections by automating descriptive metadata creation (20)

PPT
Repurposing EAD (Encoded Archival Description)
PPTX
The Missing Link: Metadata Conversion Workflows for Everyone
PPTX
Managing Descriptive Metadata with Open XML...For Now
PPTX
Using the Archivists' Toolkit: Hands-on practice and related tools
PPTX
The magic of MarcEdit, or, how I learned to stop worrying and love metadata /...
PDF
From 0 to 400 GB: Confronting the Challenges of Born-Digital Photographs
PPT
Metadata Standard for Digital Content Creation / Nafisah Ahmad
PPTX
EAD_MIAP_20161128
PDF
Reusing Collection Metadata as Data
PPT
ALA Interoperability
PPTX
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
PPTX
UW Libraries Data Services Forum
PPTX
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
PPT
Tthornton code4lib
PPT
PPT
Metadata Sharing Beyond Your Institution
PPT
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
PPTX
NCompass Live: Best Practices for Digital Collections
PPTX
EAD at Metro 09-25-13
PPTX
EAD, MARC and DACS
Repurposing EAD (Encoded Archival Description)
The Missing Link: Metadata Conversion Workflows for Everyone
Managing Descriptive Metadata with Open XML...For Now
Using the Archivists' Toolkit: Hands-on practice and related tools
The magic of MarcEdit, or, how I learned to stop worrying and love metadata /...
From 0 to 400 GB: Confronting the Challenges of Born-Digital Photographs
Metadata Standard for Digital Content Creation / Nafisah Ahmad
EAD_MIAP_20161128
Reusing Collection Metadata as Data
ALA Interoperability
Collaborate, Automate, Prepare, Prioritize: Creating Metadata for Legacy Rese...
UW Libraries Data Services Forum
Fergus Fahey - DRI/ARA(I) Training: Introduction to EAD - EAD Workshop
Tthornton code4lib
Metadata Sharing Beyond Your Institution
OCLC Research @ U of Calgary: New directions for metadata workflows across li...
NCompass Live: Best Practices for Digital Collections
EAD at Metro 09-25-13
EAD, MARC and DACS

Recently uploaded (20)

PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Basic Mud Logging Guide for educational purpose
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
master seminar digital applications in india
PDF
Insiders guide to clinical Medicine.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Complications of Minimal Access Surgery at WLH
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
VCE English Exam - Section C Student Revision Booklet
Basic Mud Logging Guide for educational purpose
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Classroom Observation Tools for Teachers
master seminar digital applications in india
Insiders guide to clinical Medicine.pdf
Pre independence Education in Inndia.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
2.FourierTransform-ShortQuestionswithAnswers.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
O5-L3 Freight Transport Ops (International) V1.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
TR - Agricultural Crops Production NC III.pdf
Institutional Correction lecture only . . .
Complications of Minimal Access Surgery at WLH
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O7-L3 Supply Chain Operations - ICLT Program
Final Presentation General Medicine 03-08-2024.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf

Improving access to special collections by automating descriptive metadata creation

  • 1. Improving Access to Special Collections by Automating Descriptive Metadata Creation Anna Neatrour, Metadata Librarian Jeremy Myntti, Head of Digital Library Services Betsey Welland, Manuscripts Archivist Jessica Breiman, AV Archivist #ula2016
  • 2. Covering today Background information about Special Collections and Digital Library Services Overview of ways we are processing data: • Extracting data from finding aids for item level description in digital collections • Extracting and Analyzing Names/Subjects in EAD • EAD to MARC record transform through MARCEdit • Techniques for working with legacy data Resource page where you can download examples P0790 Shipler Studio Photograph Collection
  • 3. Special Collections at Marriott Library • 6 departments: Print & Journal, Rare Books, Manuscripts, Photos, AV, University Records • 4 depts. produce finding aids; each department produces own finding aids even if items come from same donor • Finding aids published in Archives West; 4,000 and counting • Onsite storage; storage in Automated Retrieval Center; and offsite building P0244 Olive Woolley Burton Photograph Collection
  • 4. Special Collections at Marriott Library • Despite the size of our collections, we have no Archival Management System • Finding aids encoded by hand in XML • Former process: EADs were printed out and hand-delivered to librarians in cataloging • Doesn’t make sense for metadata librarians to recreate item-level work that has already been done. • Sought help of XSLT and MARCEdit wizards! University of Utah Archival Photograph Collection; Departments -- Computer Science
  • 5. Technical background • XSLT - Extensible Stylesheet Language Transformation - Used for transforming XML documents into other formats • Excel formulas - allow you to extract and concatenate data in a variety of ways • Low cost/free tools like Oxygen and MarcEdit • Examples, tips, and learning resources at the end of the presentation! P0206 Rocky Mountain Power and Light Company Photograph Collection
  • 6. Extracting descriptive metadata from EAD Finding Aids P0790 Shipler Studio Photograph Collection
  • 8. Container area of EAD = item level metadata
  • 9. Structure of EAD makes it easy to extract data
  • 10. XSLT (Extensible Stylesheet Language Transformations)
  • 11. Using Oxygen to extract metadata Currently we process this file with formulas in Excel We could rewrite the XSLT to handle this sort of transformation too
  • 12. Excel for additional processing See our sample spreadsheet for formulas you can repurpose!
  • 13. Extracting and Analyzing Names/Subjects in EAD P0016 Pro-Utah, Inc. Photograph Collection
  • 16. Deduplicate to see most commonly used values
  • 18. Uses for this type of data ● Identify NACO work to be completed ● Find inconsistencies in name/subject usage ● Identify typos or other problems to fix P0413 Alan K. Engen Photograph Collection
  • 19. EAD to MARC using XSLT and MarcEdit P0206 Rocky Mountain Power and Light Company Photograph Collection
  • 20. Generating MARC records from EAD Since we don’t have an archival management system to handle this automatically, this is a functional workaround. Take EAD, run transform in MarcEdit with local stylesheet, edit the resulting draft MARC record to meet local standards.
  • 21. XSLT to transform XML to MARC
  • 23. Use MARC Tools to load custom XSLT
  • 24. Generate MARC record for editing
  • 25. Working with Legacy Data P0305 University of Utah Archives Photograph Collection – A-Fa. -- Thomas Stockham
  • 26. Working with legacy data • Improves access to collections • Gives archivists more accurate data on items/formats in the collection • Helps archivists assess which items most in need of preservation/digitization University of Utah Archival Photograph Collection; Departments -- Computer Science
  • 27. AV Archives Legacy Metadata (the “before” shot) ● Hundreds of documents ● Still in WordPerfect format ● Somewhat structured
  • 28. A little better with Notepad++
  • 29. Even better with Excel...
  • 30. Contents List in EAD Item level, no series, boxes, etc.
  • 31. Conclusion • Link to resource site: https://sites.google.com/site/specialcollectionsmetadat • Repurposing data can help streamline processes and speed up descriptive metadata creation. • Eliminate time spent reformatting or copying and pasting information. • Doesn’t require a great deal of technical background to implement these solutions. • If you are doing a great deal of copying and pasting there is probably an easier, more efficient way of doing this work. • Google your problem, you will be surprised at the helpful resources you can find. • http://www.libraryworkflowexchange.org/ - collects resources in this area. P0244 Olive Woolley Burton Photograph Collection

Editor's Notes

  • #2: Jessica Breiman
  • #3: Jessica Breiman
  • #4: Jessica
  • #5: Jessica Breiman
  • #6: ANNA
  • #7: ANNA - I’m going to talk a little bit here about how we use the existing descriptive information that we have in EADs in order to create item level descriptive information in our digital library.
  • #8: ANNA Here’s an example of one of the EADs that has been used for this process
  • #9: ANNA when we want to repurpose EAD data, we are mainly looking at wanting to use items from the container level.
  • #10: ANNA Who doesn’t like looking at XML! Here’s another view of the data before it is repurposed.
  • #11: ANNA - XSLT for this is pretty simple, grabbing the data in specific tags from the EAD file. We will give you our sample XSLT to download at the end of the presentation. We do this type of transformation work in the XML editor Oxygen.
  • #12: ANNA - Example of text file output from Oxygen.
  • #13: ANNA Import text file into excel, then use formulas to split out the data. Like putting creator name in last, first format, extracting out the birth date, capturing extent. All of this was originally in one paragraph in the container list..
  • #14: Jeremy’s notes Since we don’t have an archival management system that keeps track of our EADs and everything is hand coded XML, it is difficult to keep up to date with standardizing the forms of names and subjects used in these EADs. We wanted to find a way to extract the names and subjects from all 4000 EADs so we could analyze them to find inconsistencies, to reconcile them against LCNAF and LCSH, and to discover the most often used names/subjects.
  • #15: Jeremy’s notes This XSLT will extract the different types of names and subjects from the EAD control access fields, including the type of heading (e.g. personal name, corporate body, topical subject, form/genre, etc.), the controlled vocabulary that the heading should reside in (LCNAF, LCSH, ArchivesWest), and the cataloging rules used (AACR2, RDA, DACS).
  • #16: Jeremy’s notes After running that XSLT on all of our EADs and compiling it into one large spreadsheet, we had something like this. After sorting the data, we could see where duplicate headings should have been used, but there were minor variations such as missing a death date or including different subdivisions with the subjects.
  • #17: Jeremy’s notes We then deduplicated the spreadsheet to get a count of how many EADs use each of the different headings. This was interesting to see so we could discover what the most commonly used terms were for names and subjects in our records.
  • #18: Jeremy’s notes We then used OpenRefine to reconcile the data against the Library of Congress Name Authority File and Library of Congress Subject Headings. This helped us identify the names that may have been updated in the LCNAF from AACR2 to RDA that need to be updated in our EADs. For example, the first name listed in this spreadsheet is “Smith, Joseph, 1805-1844” but since that LCNAF record has been updated for RDA, we knew that there were 28 EADs that needed “Jr.” added to the heading.
  • #19: Jeremy’s notes Some of the major outcomes of this project were that were were able to identify names that we could potentially do NACO work on if they didn’t reconcile to a record in the LCNAF, we were able to identify inconsistencies in records where the same name or subject have been used but in different forms, and we were able to find simple typos or other problems that could be fixed to make the data cleaner.
  • #20: Betsey:
  • #21: Betsey: Since we do not have an archival management system to handle the generation of MARC records from EAD, Anna helped to create a workaround that utilizes a software program called MarcEdit. If you are unfamiliar with this program, it was created by Terry Reese and is opensource. There will be a link available if you would like to gain access to the program.
  • #22: Betsey: Anna also performed the task of creating the XSLT stylesheet that will help generate the MARC record. This, in my mind since I am not familiar with XSLT is the hardest step of the entire process. Luckily, Anna created one for us that we would be happy to share with anyone if you would like an example.
  • #23: Betsey: Once the program has been installed, and you have both your XSLT stylesheet and your Encoded Archival Description (EAD) xml file, simply select the MARC tools option.
  • #24: Betsey: Using the MARC Tools, make sure that within the “Functions” option that your specific stylesheet is selected and that the “input” file is the EAD file that you would like to transfer and the “output” file will be saved as a MARC file.
  • #25: Betsey Once the MARC file has been generated it is essential to work with a cataloger who will perform quality control over all files. Currently, I work with a cataloger who reviews the files that I generate. He checks the file, includes content if necessary, and makes corrections, and then uploads the file to Alma to be discoverable on Primo along with WorldCAT.
  • #26: JESSICA
  • #27: JESSICA
  • #28: JESSICA
  • #29: JESSICA
  • #30: JESSICA
  • #31: JESSICA
  • #32: Jessica