SlideShare a Scribd company logo
Automated creation of analytic
catalog records for born-digital
journal articles
Kevin S. HAWKINS
@KevinSHawkins
Digital Library
Production Service
2
3
4
5
6
inspired
formed the
basis of
7
8
9
10
Opportunities
• HathiTrust
– offers a better infrastructure for development
than DLXS
– is certified by Trustworthy Repositories Audit &
Certification (TRAC)
• There’s growing interest among institutions in
building a shared infrastructure for publishing.
11
mPach: what are we creating?
• modular platform
• tightly coupled with the HathiTrust repository
• for open-access journals
• all you need to publish and preserve an OA
journal
• will integrate with Open Journal Systems (OJS)
12
13
mPach Prepper (1 of 8)
mPach Prepper (2 of 8)
mPach Prepper (3 of 8)
mPach Prepper (4 of 8)
mPach Prepper (5 of 8)
mPach Prepper (6 of 8)
mPach Prepper (7 of 8)
mPach Prepper (8 of 8)
Automated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articles
Questions so far?
Automated creation of analytic catalog records for born digital journal articles
Automated creation of analytic catalog records for born digital journal articles
HathiTrust’s Bibliographic Metadata
Specifications
When a HathiTrust partner institution provides a
digital object for inclusion in HathiTrust, it must
provide a catalog record in MARCXML format
using fields as defined in the Bibliographic
Metadata Specifications, an extension of MARC
21 minimal-level requirements.
What is the repository unit (barcode
equivalent) for born-digital journals?
an individual article
But …
There is also metadata that relates to the
journal as a whole, such as:
• title of the journal
• name of the publisher
• place of publication
What to do with these?
mPach’s solution: creating two kinds of
records conforming to HathiTrust’s
Bibliographic Metadata Specifications
Serial record for the
journal
Analytic record for
each article
Created manually Created automatically
by mPach’s Prepper
Workflow for manual creation of serial
records (1/2)
When a new journal comes along that will use
mPach:
1. Journal editor fills out a form that asks for:
– journal title
– any alternative titles or abbreviations
– any previous titles
– any ISSNs related to the journal
– a short description of the scope of the journal
Workflow for manual creation of serial
records (2/2)
2. A serials cataloger will check to see if the
HathiTrust catalog already contains a record
for the journal (or for any previous titles).
They will be modified, a new record will be
created, or both—linking to the journal’s
homepage.
Full view v. 4 (2014) - (original from University of Michigan)
Automated creation of analytic catalog records for born digital journal articles
So can users only discover articles by way of the
journal homepage?
Nope!
The analytic records for each article will also be in
the HathiTrust catalog, so you can find articles
directly (if, for example, you search the catalog for
a known article title).
Automatic creation of article records
(1/2)
To review, the user (e.g., the journal editor) uses
mPach’s Prepper to prepare an article for
ingest into HathiTrust.
A combination of paragraph styles in Microsoft
Word and manually entered metadata in
Prepper ensures that the bibliographic
metadata is properly encoded in JATS XML.
Automatic creation of article records
(2/2)
So because we’ll have data that is correctly
structured and actually correct, we will be able
to map from JATS XML to the fields required to
create an analytic MARCXML record for the
article.
Each analytic record will be created automatically at
the time that an article is ingested.
Our crosswalk, developed with significant
assistance from Steven Holloway at ATLA, was
donated to the JATS community on the JATS wiki.
But how good are these records? Do
they follow AACR2 or RDA?
Not in the following ways:
• Records will not have titles of articles transcribed
according to AACR2/RDA; instead, they will be in
the record as displayed in the article.
• Names will be handled as the mPach user spelled
them and divided them into forenames and
surnames.
• We haven’t bothered with choosing a main entry:
all access points are added entries.
For anyone interested, I have an annotated
handout of a working document showing
how the analytic and serial records will
relate to each other and the other
components of mPach.
Questions?
http://www.lib.umich.edu/mpach

More Related Content

PPTX
Presto Features
PPTX
ORCID at Crossref LIVE Indonesia
PPT
RUGCombine & Livetrix
PPTX
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
PPTX
Data Café — A Platform For Creating Biomedical Data Lakes
PDF
Near Duplicate Detection for Medical Imaging Data Warehouse Construction
PDF
A Review of Elastic Search: Performance Metrics and challenges
PDF
Brisbane Health-y Data: RedCap
Presto Features
ORCID at Crossref LIVE Indonesia
RUGCombine & Livetrix
Barcelona 2014: CrossRef System and Support Update by Chuck Koscher
Data Café — A Platform For Creating Biomedical Data Lakes
Near Duplicate Detection for Medical Imaging Data Warehouse Construction
A Review of Elastic Search: Performance Metrics and challenges
Brisbane Health-y Data: RedCap

What's hot (20)

PPTX
ACS 248th Paper 108 NIST-IUPAC Solubility Data
PPTX
Discover Introduction to REDCap
PPTX
Wrangling RedCap_An Introduction and Inspiration
PDF
Introduction to using REDCap for multi-site longitudinal research in medicine
DOCX
Annotating search results from web databases
PPTX
Making social science more reproducible by encapsulating access to linked data
PPT
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
PDF
ICIC 2013 New Product Introductions InfoChem
PPT
Searching SciFinder at BU
DOC
Liger cat challenge
 
PDF
Putting Historical Data in Context: how to use DSpace-GLAM
PPTX
REDCap for Surgery
PDF
The reach of Crossref metadata and who is using it
PPTX
Text and Data Mining
PPTX
Or2019 DSpace 7 Enhanced submission & workflow
PDF
Documents, services, and data on the web
PPTX
Fitting MarcEdit into the library software ecosystem
PPT
Providing Tools for Author Evaluation - A case study
PPTX
Practical approaches to entification in library bibliographic data
PDF
Hw09 Understanding Natural Language
ACS 248th Paper 108 NIST-IUPAC Solubility Data
Discover Introduction to REDCap
Wrangling RedCap_An Introduction and Inspiration
Introduction to using REDCap for multi-site longitudinal research in medicine
Annotating search results from web databases
Making social science more reproducible by encapsulating access to linked data
Sherborn: Lyal - Digitising legacy taxonomic literature: processes, products ...
ICIC 2013 New Product Introductions InfoChem
Searching SciFinder at BU
Liger cat challenge
 
Putting Historical Data in Context: how to use DSpace-GLAM
REDCap for Surgery
The reach of Crossref metadata and who is using it
Text and Data Mining
Or2019 DSpace 7 Enhanced submission & workflow
Documents, services, and data on the web
Fitting MarcEdit into the library software ecosystem
Providing Tools for Author Evaluation - A case study
Practical approaches to entification in library bibliographic data
Hw09 Understanding Natural Language
Ad

Similar to Automated creation of analytic catalog records for born digital journal articles (20)

PPTX
"Data Dynamics: Trends & Patterns Revealed"
PPTX
Marcalyc: XML JATS Markup System
PPT
The ticTOCs Project: Transforming current awareness
PDF
Essential Python Libraries Every Developer Should Know - CETPA Infotech
PPT
Evaluation of Research Tools
PPTX
ACS 248th Paper 71 ChAMP Project
PDF
Enabling SQL Access to Data Lakes
PPT
RUGCombine & Livetrix : search for a perfect interface ....?
PPT
JournalTOCs - Introduction and Feedback
PDF
2 - Systematic Literature Reviews: tools
PPT
Inforum2008
DOCX
Product data processing 30.08.2011 gg
PPTX
IQSS Presentation to Program in Health Policy
PPTX
What will be new in Apache NiFi 1.2.0
PPTX
Automated catologuing system
PPTX
Digital Library Applications Of Social Networking
PPTX
Digital Library Applications Of Social Networking Jeju Intl Conference
PDF
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
PDF
Webtools For Reference Search
PDF
Understanding information content with apache tika
"Data Dynamics: Trends & Patterns Revealed"
Marcalyc: XML JATS Markup System
The ticTOCs Project: Transforming current awareness
Essential Python Libraries Every Developer Should Know - CETPA Infotech
Evaluation of Research Tools
ACS 248th Paper 71 ChAMP Project
Enabling SQL Access to Data Lakes
RUGCombine & Livetrix : search for a perfect interface ....?
JournalTOCs - Introduction and Feedback
2 - Systematic Literature Reviews: tools
Inforum2008
Product data processing 30.08.2011 gg
IQSS Presentation to Program in Health Policy
What will be new in Apache NiFi 1.2.0
Automated catologuing system
Digital Library Applications Of Social Networking
Digital Library Applications Of Social Networking Jeju Intl Conference
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Webtools For Reference Search
Understanding information content with apache tika
Ad

More from NASIG (20)

PPTX
Ctrl + Alt + Repeat: Strategies for Regaining Authority Control after a Migra...
PPTX
The Serial Cohort: A Confederacy of Catalogers
PDF
Calculating how much your University spends on Open Access and what to do abo...
PPTX
Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...
PPTX
Analyzing workflows and improving communication across departments
PDF
Supporting Students: OER and Textbook Affordability Initiatives at a Mid-Size...
PPTX
Access to Supplemental Journal Article Materials
PPTX
Communications and context: strategies for onboarding new e-resources librari...
PDF
Full Text Coverage Ratios: A Simple Method of Article-Level Collections Analy...
PPTX
Bloomsbury digital resources
PPTX
Web accessibility in the institutional repository crafting user centered sub...
PPTX
Linked Data at Smithsonian Libraries
PPTX
Walk this way: Online content platform migration experiences and collaboration
PDF
Read & Publish – What It Takes to Implement a Seamless Model?
PDF
Mapping Domain Knowledge for Leading and Managing Change
PPTX
When to hold them when to fold them: reassessing big deals in 2020
PPTX
Getting on the Same Page: Aligning ERM and LIbGuides Content
PPTX
A multi-institutional model for advancing open access journals and reclaiming...
PPTX
Knowledge Bases: The Heart of Resource Management
PPTX
Practical approaches to linked data
Ctrl + Alt + Repeat: Strategies for Regaining Authority Control after a Migra...
The Serial Cohort: A Confederacy of Catalogers
Calculating how much your University spends on Open Access and what to do abo...
Measure Twice and Cut Once: How a Budget Cut Impacted Subscription Renewals f...
Analyzing workflows and improving communication across departments
Supporting Students: OER and Textbook Affordability Initiatives at a Mid-Size...
Access to Supplemental Journal Article Materials
Communications and context: strategies for onboarding new e-resources librari...
Full Text Coverage Ratios: A Simple Method of Article-Level Collections Analy...
Bloomsbury digital resources
Web accessibility in the institutional repository crafting user centered sub...
Linked Data at Smithsonian Libraries
Walk this way: Online content platform migration experiences and collaboration
Read & Publish – What It Takes to Implement a Seamless Model?
Mapping Domain Knowledge for Leading and Managing Change
When to hold them when to fold them: reassessing big deals in 2020
Getting on the Same Page: Aligning ERM and LIbGuides Content
A multi-institutional model for advancing open access journals and reclaiming...
Knowledge Bases: The Heart of Resource Management
Practical approaches to linked data

Recently uploaded (20)

PPTX
GDM (1) (1).pptx small presentation for students
PDF
Insiders guide to clinical Medicine.pdf
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Cell Structure & Organelles in detailed.
PDF
Classroom Observation Tools for Teachers
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Cell Types and Its function , kingdom of life
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
GDM (1) (1).pptx small presentation for students
Insiders guide to clinical Medicine.pdf
VCE English Exam - Section C Student Revision Booklet
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Microbial diseases, their pathogenesis and prophylaxis
PPH.pptx obstetrics and gynecology in nursing
Sports Quiz easy sports quiz sports quiz
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Cell Structure & Organelles in detailed.
Classroom Observation Tools for Teachers
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
TR - Agricultural Crops Production NC III.pdf
Microbial disease of the cardiovascular and lymphatic systems
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH
Pharma ospi slides which help in ospi learning
Cell Types and Its function , kingdom of life
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student

Automated creation of analytic catalog records for born digital journal articles

  • 1. Automated creation of analytic catalog records for born-digital journal articles Kevin S. HAWKINS @KevinSHawkins
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. Opportunities • HathiTrust – offers a better infrastructure for development than DLXS – is certified by Trustworthy Repositories Audit & Certification (TRAC) • There’s growing interest among institutions in building a shared infrastructure for publishing. 11
  • 12. mPach: what are we creating? • modular platform • tightly coupled with the HathiTrust repository • for open-access journals • all you need to publish and preserve an OA journal • will integrate with Open Journal Systems (OJS) 12
  • 13. 13
  • 27. HathiTrust’s Bibliographic Metadata Specifications When a HathiTrust partner institution provides a digital object for inclusion in HathiTrust, it must provide a catalog record in MARCXML format using fields as defined in the Bibliographic Metadata Specifications, an extension of MARC 21 minimal-level requirements.
  • 28. What is the repository unit (barcode equivalent) for born-digital journals? an individual article
  • 29. But … There is also metadata that relates to the journal as a whole, such as: • title of the journal • name of the publisher • place of publication What to do with these?
  • 30. mPach’s solution: creating two kinds of records conforming to HathiTrust’s Bibliographic Metadata Specifications Serial record for the journal Analytic record for each article Created manually Created automatically by mPach’s Prepper
  • 31. Workflow for manual creation of serial records (1/2) When a new journal comes along that will use mPach: 1. Journal editor fills out a form that asks for: – journal title – any alternative titles or abbreviations – any previous titles – any ISSNs related to the journal – a short description of the scope of the journal
  • 32. Workflow for manual creation of serial records (2/2) 2. A serials cataloger will check to see if the HathiTrust catalog already contains a record for the journal (or for any previous titles). They will be modified, a new record will be created, or both—linking to the journal’s homepage.
  • 33. Full view v. 4 (2014) - (original from University of Michigan)
  • 35. So can users only discover articles by way of the journal homepage? Nope! The analytic records for each article will also be in the HathiTrust catalog, so you can find articles directly (if, for example, you search the catalog for a known article title).
  • 36. Automatic creation of article records (1/2) To review, the user (e.g., the journal editor) uses mPach’s Prepper to prepare an article for ingest into HathiTrust. A combination of paragraph styles in Microsoft Word and manually entered metadata in Prepper ensures that the bibliographic metadata is properly encoded in JATS XML.
  • 37. Automatic creation of article records (2/2) So because we’ll have data that is correctly structured and actually correct, we will be able to map from JATS XML to the fields required to create an analytic MARCXML record for the article. Each analytic record will be created automatically at the time that an article is ingested. Our crosswalk, developed with significant assistance from Steven Holloway at ATLA, was donated to the JATS community on the JATS wiki.
  • 38. But how good are these records? Do they follow AACR2 or RDA? Not in the following ways: • Records will not have titles of articles transcribed according to AACR2/RDA; instead, they will be in the record as displayed in the article. • Names will be handled as the mPach user spelled them and divided them into forenames and surnames. • We haven’t bothered with choosing a main entry: all access points are added entries.
  • 39. For anyone interested, I have an annotated handout of a working document showing how the analytic and serial records will relate to each other and the other components of mPach.