SlideShare a Scribd company logo
The Differences Problem
Or why consistency in metadata is critical in the discovery process
Shana L. McDanold
First A few caveats…
2
Inthenotso
distantpast…
There were two main options when searching for ebooks:
1. Search each individual vendor’s website/database
2. Load MARC records (one record for each title) into the
catalog for each vendor
3
Inthenotso
distantpast…
Problems with this approach:
 Loading records is a LOT of work and requires regular
maintenance
 Massaging/editing/enhancing metadata; loading;
updates; replacements; deletes
 Number of records/titles to load
 Lack of records available for loading
 Records come from numerous places and each vendor
requires a different procedure to download files
 Tracking titles in multiple places (duplicate work)
4
Now:more
options…
1. Search each individual vendor’s website/database
2. Load MARC records (one record for each title) into the
catalog for each vendor
3. Integration of various vendors metadata into
discovery layers via APIs and linked data rather than
importing records into the catalog
4. Federated search tools that index multiple databases
(e.g. unified index search tools)
…but are more options better?
5
Thegoodand
thebad
GOOD:
 fewer places to search (possibly even only one)
 most public libraries, while they have other ebook
databases, will have a single integrated discovery layer
BAD:
 MORE places to search
BUT discovery is still a challenge no matter which search
option you choose, and those challenges are centered
around:
METADATA
6
Printbook
7
Ebook
8
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
9
Printbook
10
Ebook
11
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
12
Printbook
13
Ebook
14
Differences?
 ISBN
 Subjects
 Title
 Author
 Date
15
Differences
defined
 Differences in description
 Current vs past rules and guidelines;
 RDA provider neutral vs individual vendor records
 Differences between vendors for same title
 Differences in how data is entered/presented
 Record proliferation
 Related to metadata differences: records cannot be
“collapsed” because the discovery layer doesn’t recognize
them as the same
 Different vocabularies and identity databases
16
More
differences
 Missing metadata/missing records
 Data changes/updates
 Branding or custom text/collections
17
Whydothese
differences
matter?
 How people search
 Keyword - forces dependency on keyword indexes
 Follow links - if you click on the subject search for
Obama, Michelle, search results include only print books
(no ebooks)
 Limits/facets - dependent on metadata, both visible
and invisible (coded)
 Missing metadata
 Discovery layer exposes ALL the metadata (good, bad,
missing)
All means items get “hidden” because they’re not
findable.
18
How dowefix
it?
 CONSISTENCY
 use of controlled vocabularies and existing authority
databases (name matching, subjects, etc.)
 Use existing metadata sources
 Follow standards and recommended/best practices
 Communication
 Data points
 complete
 consistency across vendors
19
Questions?
20

More Related Content

PPTX
Miraglia, "So Many Needles, So Many Haystacks: Challenges for e-Book Discover...
PPT
So you think you know CrossRef
PDF
Types and relations between tags in SharePoint 2010
PPT
Encore Presentation - ACRL/NEC ITIG Annual Meeting
PPTX
SharePoint Folders: Folders vs. Metadata
PPTX
Database research online
PPTX
Participation reports webinar December 2020
PPTX
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...
Miraglia, "So Many Needles, So Many Haystacks: Challenges for e-Book Discover...
So you think you know CrossRef
Types and relations between tags in SharePoint 2010
Encore Presentation - ACRL/NEC ITIG Annual Meeting
SharePoint Folders: Folders vs. Metadata
Database research online
Participation reports webinar December 2020
Crossref LIVE Indonesia: The Value and Use of Crossref Metadata, CRLIVE-ID 15...

What's hot (20)

PPT
Introduction to Endnote
PPTX
SharePoint Folders vs. Metadata Best Practices
PPTX
Using lirn revised
PPT
ENDNOTE presentation
PDF
crossmark update
PPT
Anatomy Of Ia
PDF
PDF
Basics of EndNote research tool
PPTX
Best Practices for Organizing Documents in SharePoint 2010
PPTX
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
PPT
Using Endnote
PPT
Folders vs. Metadata: SharePoint Engage Oct. 20, 2015
PPTX
Participation reports webinar November 2020
PPTX
Ms access 2010
PDF
MS Access 2010 tutorial 1
PPTX
Preparing Catalogers for Linked data
PPTX
Intranet mockups
PDF
FundRef on the AAP/PSP panel: CHORUS: A Collaborative Approach to Public Access
Introduction to Endnote
SharePoint Folders vs. Metadata Best Practices
Using lirn revised
ENDNOTE presentation
crossmark update
Anatomy Of Ia
Basics of EndNote research tool
Best Practices for Organizing Documents in SharePoint 2010
#mashcat: Evolving MarcEdit: Leveraging Semantic Data in MarcEdit
Using Endnote
Folders vs. Metadata: SharePoint Engage Oct. 20, 2015
Participation reports webinar November 2020
Ms access 2010
MS Access 2010 tutorial 1
Preparing Catalogers for Linked data
Intranet mockups
FundRef on the AAP/PSP panel: CHORUS: A Collaborative Approach to Public Access
Ad

Similar to McDanold, "The Differences Problem: Or why Consistency in Metadata is Critical in the Discovery Process" (20)

PPT
IA Summit 09 - User Interfaces with Metasearch Capabilities
PPT
KBART ALA Midwinter 2010 Update
PPT
Kbart Update ALA Midwinter 2010
PPTX
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
PPT
Building A Digital Ref Collection
PPT
IWMW 2002: The Value of Metadata and How to Realise It
PPT
Metadata : Concentrating on the data, not on the scheme
PPT
Hearst Faceted Metadata for Site Navigation and Search
PDF
Metadata
PPTX
Many flavors of linked data
PPTX
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
PDF
Webinar: Ditching File Shares For SharePoint Metadata
DOCX
Information Systems For Business and BeyondChapter 4Data a.docx
PPT
Establishing the Connection: Creating a Linked Data Version of the BNB
PPTX
Relational database concept and technology
PDF
Sorting & Extracting Data
PPTX
Payton Eliminating Conflicts in Ebook Metadata
PPT
Using metadata repositories with search
PPTX
A theory of Metadata enriching & filtering
DOC
Being an independent & assertive learner 2
IA Summit 09 - User Interfaces with Metasearch Capabilities
KBART ALA Midwinter 2010 Update
Kbart Update ALA Midwinter 2010
April 8 NISO Webinar: Experimenting with BIBFRAME: Reports from Early Adopters
Building A Digital Ref Collection
IWMW 2002: The Value of Metadata and How to Realise It
Metadata : Concentrating on the data, not on the scheme
Hearst Faceted Metadata for Site Navigation and Search
Metadata
Many flavors of linked data
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Webinar: Ditching File Shares For SharePoint Metadata
Information Systems For Business and BeyondChapter 4Data a.docx
Establishing the Connection: Creating a Linked Data Version of the BNB
Relational database concept and technology
Sorting & Extracting Data
Payton Eliminating Conflicts in Ebook Metadata
Using metadata repositories with search
A theory of Metadata enriching & filtering
Being an independent & assertive learner 2
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...

Recently uploaded (20)

PDF
01-Introduction-to-Information-Management.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Classroom Observation Tools for Teachers
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Lesson notes of climatology university.
PPTX
Cell Types and Its function , kingdom of life
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Basic Mud Logging Guide for educational purpose
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
GDM (1) (1).pptx small presentation for students
01-Introduction-to-Information-Management.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Classroom Observation Tools for Teachers
102 student loan defaulters named and shamed – Is someone you know on the list?
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Institutional Correction lecture only . . .
Supply Chain Operations Speaking Notes -ICLT Program
Pharma ospi slides which help in ospi learning
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Lesson notes of climatology university.
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Sports Quiz easy sports quiz sports quiz
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
Basic Mud Logging Guide for educational purpose
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
GDM (1) (1).pptx small presentation for students

McDanold, "The Differences Problem: Or why Consistency in Metadata is Critical in the Discovery Process"

  • 1. The Differences Problem Or why consistency in metadata is critical in the discovery process Shana L. McDanold
  • 2. First A few caveats… 2
  • 3. Inthenotso distantpast… There were two main options when searching for ebooks: 1. Search each individual vendor’s website/database 2. Load MARC records (one record for each title) into the catalog for each vendor 3
  • 4. Inthenotso distantpast… Problems with this approach:  Loading records is a LOT of work and requires regular maintenance  Massaging/editing/enhancing metadata; loading; updates; replacements; deletes  Number of records/titles to load  Lack of records available for loading  Records come from numerous places and each vendor requires a different procedure to download files  Tracking titles in multiple places (duplicate work) 4
  • 5. Now:more options… 1. Search each individual vendor’s website/database 2. Load MARC records (one record for each title) into the catalog for each vendor 3. Integration of various vendors metadata into discovery layers via APIs and linked data rather than importing records into the catalog 4. Federated search tools that index multiple databases (e.g. unified index search tools) …but are more options better? 5
  • 6. Thegoodand thebad GOOD:  fewer places to search (possibly even only one)  most public libraries, while they have other ebook databases, will have a single integrated discovery layer BAD:  MORE places to search BUT discovery is still a challenge no matter which search option you choose, and those challenges are centered around: METADATA 6
  • 9. Differences?  ISBN  Subjects  Title  Author  Date 9
  • 12. Differences?  ISBN  Subjects  Title  Author  Date 12
  • 15. Differences?  ISBN  Subjects  Title  Author  Date 15
  • 16. Differences defined  Differences in description  Current vs past rules and guidelines;  RDA provider neutral vs individual vendor records  Differences between vendors for same title  Differences in how data is entered/presented  Record proliferation  Related to metadata differences: records cannot be “collapsed” because the discovery layer doesn’t recognize them as the same  Different vocabularies and identity databases 16
  • 17. More differences  Missing metadata/missing records  Data changes/updates  Branding or custom text/collections 17
  • 18. Whydothese differences matter?  How people search  Keyword - forces dependency on keyword indexes  Follow links - if you click on the subject search for Obama, Michelle, search results include only print books (no ebooks)  Limits/facets - dependent on metadata, both visible and invisible (coded)  Missing metadata  Discovery layer exposes ALL the metadata (good, bad, missing) All means items get “hidden” because they’re not findable. 18
  • 19. How dowefix it?  CONSISTENCY  use of controlled vocabularies and existing authority databases (name matching, subjects, etc.)  Use existing metadata sources  Follow standards and recommended/best practices  Communication  Data points  complete  consistency across vendors 19

Editor's Notes

  • #3: Usually differences are a GOOD thing, providing diversity; but not in this case Caveat: speaking from a public library perspective mainly; although most of the issues public libraries have are present in academic environments; differences are resource types and focus on currency/popularity of materials (collection is more ephemeral than permanent) BUT my background is serials and nonprint format cataloging – been dealing with managing metadata/cataloging for ejournals and ebooks for almost 2 decades now My philosophy: job of cataloging/metadata is to make stuff findable, which includes unique identification of resources I don’t believe in the “perfect” record If it’s not wrong, leave it alone (don’t delete data, just exclude it from indexes…you may want it in the future) When editing: Fix errors or delete if wrong Add access points Enhance content/description (add value) Make it pretty
  • #5: Number of vendors increased – more complex  more time Each vendor: different procedure for downloading; different edits (some need proxy added, some don’t); files may be in various formats and require conversion to MARC Tools to help streamline (MarcEdit – TASK LISTS saving the edits for each vendor are a savior) BUT still very time consuming Multiple places: ERM and the Catalog and possibly the vendor website – have to keep in sync
  • #7: Looking at a single search option for ebooks and print books, where an API is used to search both ebook vendor and the catalog in one search So lets look at examples – examples are current popular titles or authors
  • #8: Who’s watching the show on Netflix?
  • #10: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Title: ebook version is incomplete Author: translator is missing, an issue when looking for a specific translation or if searching by translator name Date format – indexing issue – how does your system handle dates?
  • #13: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Title: ebook version is incomplete Author: indexing issues; identity management/authority control issues Date format – indexing issue – how does your system handle dates?
  • #16: ISBN: this is often a key match point for OpenURL resolvers or other API/linked data tools Subject: where’s DC?? Title: ebook version is different Author: indexing issues; identity management/authority control issues Date format – indexing issue – how does your system handle dates? Do you see a trend yet?
  • #17: Description: AACR2 vs RDA – fundamental change in how you approach describing a resource Provider neutral – one records for ALL online versions of a title (formats, platform, etc.) – just have multiple links/URLs to various options; Hard to do that with APIs/linked data tools Date format, author format (last, first or first last?) Proliferation: more vendors = more records We get patron complaints about ebook display all the time Different vocabularies and identity databases – name formats, subjects, locations, etc.  Creates indexing and filing issues; split indexes
  • #18: Missing: sometimes records just don’t appear – API/linked data tool errors, delays, Data changes: records get “out of sync” – print book may be complete but ebook is still minimal/prepublication Branding: can’t add custom text to create collections, or other data to ebook records; limits to control over display and what data is included – stuck with what the vendor sends/makes available
  • #19: Forcing dependency on keyword indexing or indexing of the WHOLE records – specific author indexes, etc. become not useful How people search: Subjects/identities – FORM matters “see also” Collections Links – find something the want/like, follow links to “similar” or “like” items using subjects, authors, etc. (internet rabbit hole…) Limits/facets – such as format, publication date, location, etc. Missing metadata – subjects, ISBN, names, locations, etc.; lose match points; may result in records not appearing – search ISBN and the ebooks don’t show up Discovery layers – good at exposing EVERYTHING (great way to identify database cleanup projects…)
  • #20: Communication – between libraries and vendors Data points – more is better, even if they don’t display