SlideShare a Scribd company logo
The Differences Problem
Or why consistency in metadata is critical in the discovery process
Shana L. McDanold
First A few caveats…
2
Inthenotso
distantpast…
There were two main options when searching for ebooks:
1. Search each individual vendor’s website/database
2. Load MARC records (one record for each title) into the
catalog for each vendor
3
Inthenotso
distantpast…
Problems with this approach:
 Loading records is a LOT of work and requires regular
maintenance
 Massaging/editing/enhancing metadata; loading;
updates; replacements; deletes
 Number of records/titles to load
 Lack of records available for loading
 Records come from numerous places and each vendor
requires a different procedure to download files
 Tracking titles in multiple places (duplicate work)
4
Now:more
options…
1. Search each individual vendor’s website/database
2. Load MARC records (one record for each title) into the
catalog for each vendor
3. Integration of various vendors metadata into
discovery layers via APIs and linked data rather than
importing records into the catalog
4. Federated search tools that index multiple databases
(e.g. unified index search tools)
…but are more options better?
5
Thegoodand
thebad
GOOD:
 fewer places to search (possibly even only one)
 most public libraries, while they have other ebook
databases, will have a single integrated discovery layer
BAD:
 MORE places to search
BUT discovery is still a challenge no matter which search
option you choose, and those challenges are centered
around:
METADATA
6
Printbook
7
Ebook
8
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
9
Printbook
10
Ebook
11
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
12
Printbook
13
Ebook
14
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
15
Differences
defined
 Differences in description
 Current vs past rules and guidelines;
 RDA provider neutral vs individual vendor records
 Differences between vendors for same title
 Differences in how data is entered/presented
 Record proliferation
 Related to metadata differences: records cannot be
“collapsed” because the discovery layer doesn’t recognize
them as the same
 Different vocabularies and identity databases
16
More
differences
 Missing metadata/missing records
 Data changes/updates
 Branding or custom text/collections
17
Whydothese
differences
matter?
 How people search
 Keyword - forces dependency on keyword indexes
 Follow links - if you click on the subject search for
Obama, Michelle, search results include only print books
(no ebooks)
 Limits/facets - dependent on metadata, both visible
and invisible (coded)
 Missing metadata
 Discovery layer exposes ALL the metadata (good, bad,
missing)
All means items get “hidden” because they’re not
findable.
18
How dowefix
it?
 CONSISTENCY
 use of controlled vocabularies and existing authority
databases (name matching, subjects, etc.)
 Use existing metadata sources
 Follow standards and recommended/best practices
 Communication
 Data points
 complete
 consistency across vendors
19
Questions?
20

More Related Content

PPT
ENDNOTE presentation
PPTX
McDanold, "The Differences Problem: Or why Consistency in Metadata is Critica...
PPTX
Miraglia, "So Many Needles, So Many Haystacks: Challenges for e-Book Discover...
PPTX
ABI/INFORM Complete Search
PPT
So you think you know CrossRef
PPTX
SharePoint Folders: Folders vs. Metadata
PDF
Types and relations between tags in SharePoint 2010
PDF
ENDNOTE presentation
McDanold, "The Differences Problem: Or why Consistency in Metadata is Critica...
Miraglia, "So Many Needles, So Many Haystacks: Challenges for e-Book Discover...
ABI/INFORM Complete Search
So you think you know CrossRef
SharePoint Folders: Folders vs. Metadata
Types and relations between tags in SharePoint 2010

What's hot (16)

PDF
MS Access 2010 tutorial 1
PPTX
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
PPTX
SharePoint Folders vs. Metadata Best Practices
PDF
crossmark update
PPTX
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
PPTX
Ms access 2010
PPTX
Fitting MarcEdit into the library software ecosystem
PPTX
Participation reports webinar December 2020
PPT
Encore Presentation - ACRL/NEC ITIG Annual Meeting
PPT
Folders vs. Metadata: SharePoint Engage Oct. 20, 2015
PPTX
Best Practices for Organizing Documents in SharePoint 2010
PPTX
Preparing Catalogers for Linked data
PPT
Using RefWorks to Manage Your Literature
PPTX
Participation reports webinar November 2020
PPT
Tactical Fingerprinting using metadata, hidden info and lost data
PPTX
Database poll results
MS Access 2010 tutorial 1
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
SharePoint Folders vs. Metadata Best Practices
crossmark update
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
Ms access 2010
Fitting MarcEdit into the library software ecosystem
Participation reports webinar December 2020
Encore Presentation - ACRL/NEC ITIG Annual Meeting
Folders vs. Metadata: SharePoint Engage Oct. 20, 2015
Best Practices for Organizing Documents in SharePoint 2010
Preparing Catalogers for Linked data
Using RefWorks to Manage Your Literature
Participation reports webinar November 2020
Tactical Fingerprinting using metadata, hidden info and lost data
Database poll results
Ad

Similar to Differences Problem: or why consistency in metadata is critical in the discovery process (20)

PPTX
What's the fuss about all this metadata?
PPT
Hartley Presentation on Cataloging & Metadata Trends
PPTX
Sherif Metadata Talk - London (June 25th 2018)
PPTX
Getaneh Alemu (Southampton Solent) - The existing challenges and opportunitie...
PPTX
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
PPT
UAEU_MDL_Slides_rev1.ppt
PPTX
Metadata enriching and discovery at Solent University Library
PPTX
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
PPTX
Webscale Discovery with the Enduser in Mind
PPTX
Code4Lib Keynote 2011
PDF
Metadata enriching and discovery
PPTX
NISO Webinar: Keyword Search = "Improve Discovery Systems"
PPT
What Publishers Need to Know About Web Scale Discovery
PPTX
Payton Eliminating Conflicts in Ebook Metadata
PPTX
Library discovery: past, present and some futures
PPTX
Introduction to Metadata
PPTX
Discovery Interfaces
PPTX
3 - Discovery-systems
PDF
MHRA - invisible koha
PPS
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
What's the fuss about all this metadata?
Hartley Presentation on Cataloging & Metadata Trends
Sherif Metadata Talk - London (June 25th 2018)
Getaneh Alemu (Southampton Solent) - The existing challenges and opportunitie...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
UAEU_MDL_Slides_rev1.ppt
Metadata enriching and discovery at Solent University Library
Web-scale Discovery Implementation with the End User in Mind (SLA 2012)
Webscale Discovery with the Enduser in Mind
Code4Lib Keynote 2011
Metadata enriching and discovery
NISO Webinar: Keyword Search = "Improve Discovery Systems"
What Publishers Need to Know About Web Scale Discovery
Payton Eliminating Conflicts in Ebook Metadata
Library discovery: past, present and some futures
Introduction to Metadata
Discovery Interfaces
3 - Discovery-systems
MHRA - invisible koha
Choosing What to Hold and What to Fold: Database Quality Decisions in Tough ...
Ad

More from Shana McDanold (10)

PPTX
LODLAM Landscape
DOCX
LODLAM Landscape NOTES
PPTX
Heretical Metadata: Abandoning Perfection in the Digital Age
PPTX
All About Access Points in RDA
PPTX
It's All About the Metadata
PPTX
Importance of teaching cataloging theory and conceptual models of discovery s...
PPTX
RDA for Original Catalogers
PPTX
RDA and Editing Bibliographic Records
PPTX
Impact of RDA on Serials Cataloging
PPTX
RDA from Scratch for Catalogers
LODLAM Landscape
LODLAM Landscape NOTES
Heretical Metadata: Abandoning Perfection in the Digital Age
All About Access Points in RDA
It's All About the Metadata
Importance of teaching cataloging theory and conceptual models of discovery s...
RDA for Original Catalogers
RDA and Editing Bibliographic Records
Impact of RDA on Serials Cataloging
RDA from Scratch for Catalogers

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Foundation of Data Science unit number two notes
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Business Analytics and business intelligence.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
annual-report-2024-2025 original latest.
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
Foundation of Data Science unit number two notes
Data_Analytics_and_PowerBI_Presentation.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
.pdf is not working space design for the following data for the following dat...
Business Analytics and business intelligence.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ISS -ESG Data flows What is ESG and HowHow
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
annual-report-2024-2025 original latest.
Acceptance and paychological effects of mandatory extra coach I classes.pptx
climate analysis of Dhaka ,Banglades.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Qualitative Qantitative and Mixed Methods.pptx

Differences Problem: or why consistency in metadata is critical in the discovery process

  • 1. The Differences Problem Or why consistency in metadata is critical in the discovery process Shana L. McDanold
  • 2. First A few caveats… 2
  • 3. Inthenotso distantpast… There were two main options when searching for ebooks: 1. Search each individual vendor’s website/database 2. Load MARC records (one record for each title) into the catalog for each vendor 3
  • 4. Inthenotso distantpast… Problems with this approach:  Loading records is a LOT of work and requires regular maintenance  Massaging/editing/enhancing metadata; loading; updates; replacements; deletes  Number of records/titles to load  Lack of records available for loading  Records come from numerous places and each vendor requires a different procedure to download files  Tracking titles in multiple places (duplicate work) 4
  • 5. Now:more options… 1. Search each individual vendor’s website/database 2. Load MARC records (one record for each title) into the catalog for each vendor 3. Integration of various vendors metadata into discovery layers via APIs and linked data rather than importing records into the catalog 4. Federated search tools that index multiple databases (e.g. unified index search tools) …but are more options better? 5
  • 6. Thegoodand thebad GOOD:  fewer places to search (possibly even only one)  most public libraries, while they have other ebook databases, will have a single integrated discovery layer BAD:  MORE places to search BUT discovery is still a challenge no matter which search option you choose, and those challenges are centered around: METADATA 6
  • 9. Differences?  ISBN  Subjects  Title  Author  Date 9
  • 12. Differences?  ISBN  Subjects  Title  Author  Date 12
  • 15. Differences?  ISBN  Subjects  Title  Author  Date 15
  • 16. Differences defined  Differences in description  Current vs past rules and guidelines;  RDA provider neutral vs individual vendor records  Differences between vendors for same title  Differences in how data is entered/presented  Record proliferation  Related to metadata differences: records cannot be “collapsed” because the discovery layer doesn’t recognize them as the same  Different vocabularies and identity databases 16
  • 17. More differences  Missing metadata/missing records  Data changes/updates  Branding or custom text/collections 17
  • 18. Whydothese differences matter?  How people search  Keyword - forces dependency on keyword indexes  Follow links - if you click on the subject search for Obama, Michelle, search results include only print books (no ebooks)  Limits/facets - dependent on metadata, both visible and invisible (coded)  Missing metadata  Discovery layer exposes ALL the metadata (good, bad, missing) All means items get “hidden” because they’re not findable. 18
  • 19. How dowefix it?  CONSISTENCY  use of controlled vocabularies and existing authority databases (name matching, subjects, etc.)  Use existing metadata sources  Follow standards and recommended/best practices  Communication  Data points  complete  consistency across vendors 19

Editor's Notes

  • #3: Usually differences are a GOOD thing, providing diversity; but not in this case Caveat: speaking from a public library perspective mainly; although most of the issues public libraries have are present in academic environments; differences are resource types and focus on currency/popularity of materials (collection is more ephemeral than permanent) BUT my background is serials and nonprint format cataloging – been dealing with managing metadata/cataloging for ejournals and ebooks for almost 2 decades now My philosophy: job of cataloging/metadata is to make stuff findable, which includes unique identification of resources I don’t believe in the “perfect” record If it’s not wrong, leave it alone (don’t delete data, just exclude it from indexes…you may want it in the future) When editing: Fix errors or delete if wrong Add access points Enhance content/description (add value) Make it pretty
  • #5: Number of vendors increased – more complex  more time Each vendor: different procedure for downloading; different edits (some need proxy added, some don’t); files may be in various formats and require conversion to MARC Tools to help streamline (MarcEdit – TASK LISTS saving the edits for each vendor are a savior) BUT still very time consuming Multiple places: ERM and the Catalog and possibly the vendor website – have to keep in sync
  • #7: Looking at a single search option for ebooks and print books, where an API is used to search both ebook vendor and the catalog in one search So lets look at examples – examples are current popular titles or authors
  • #8: Who’s watching the show on Netflix?
  • #10: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Title: ebook version is incomplete Author: translator is missing, an issue when looking for a specific translation or if searching by translator name Date format – indexing issue – how does your system handle dates?
  • #13: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Title: ebook version is incomplete Author: indexing issues; identity management/authority control issues Date format – indexing issue – how does your system handle dates?
  • #16: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Subject: where’s DC?? Title: ebook version is different Author: indexing issues; identity management/authority control issues Date format – indexing issue – how does your system handle dates? Do you see a trend yet?
  • #17: Description: AACR2 vs RDA – fundamental change in how you approach describing a resource Provider neutral – one records for ALL online versions of a title (formats, platform, etc.) – just have multiple links/URLs to various options; Hard to do that with APIs/linked data tools Date format, author format (last, first or first last?) Proliferation: more vendors = more records We get patron complaints about ebook display all the time Different vocabularies and identity databases – name formats, subjects, locations, etc.  Creates indexing and filing issues; split indexes
  • #18: Missing: sometimes records just don’t appear – API/linked data tool errors, delays, Data changes: records get “out of sync” – print book may be complete but ebook is still minimal/prepublication Branding: can’t add custom text to create collections, or other data to ebook records; limits to control over display and what data is included – stuck with what the vendor sends/makes available
  • #19: Forcing dependency on keyword indexing or indexing of the WHOLE records – specific author indexes, etc. become not useful How people search: Subjects/identities – FORM matters “see also” Collections Links – find something the want/like, follow links to “similar” or “like” items using subjects, authors, etc. (internet rabbit hole…) Limits/facets – such as format, publication date, location, etc. Missing metadata – subjects, ISBN, names, locations, etc.; lose match points; may result in records not appearing – search ISBN and the ebooks don’t show up Discovery layers – good at exposing EVERYTHING (great way to identify database cleanup projects…)
  • #20: Communication – between libraries and vendors Data points – more is better, even if they don’t display