SlideShare a Scribd company logo
Chris Erdmann
Judy Ruttenberg
Todd Vision
NISO Virtual Conference: Open Data Projects
June 13, 2018
Community approaches to
open data at scale
Chris Erdmann
The Carpentries/California Digital Library
Metadata 2020 Participant
@libcce / chris@carpentries.org
Metadata 2020:
Who, what, when,
where, why?
As a researcher…I’m a bit bloody
fed up with Data Management -
Cameron Neylon
What is Metadata 2020?
Metadata 2020 is a collaboration that
advocates richer, connected, and reusable,
open metadata for all research outputs, which
will advance scholarly pursuits for the benefit of
society.
COMMUNITY GROUPS
RESEARCHERS
Cameron Neylon, Curtin (Chair), Bethany Drehman, FASEB, Ernesto Priego, University of London, Eva Mendez,
UC3M/OSPP, Juan Pablo Alperin, Public Knowledge Project, L.K. Williams, Interfolio...
SERVICE PROVIDER/PLATFORMS AND TOOLS
Marianne Calilhanna, Cenveo Publisher Services (Chair), Adrian-Tudor Pănescu, Figshare, Bob Kasenchak, Access
Innovations, Dan Nigloschy, XML workflow solutions architect...
FUNDERS
Ross Mounce, Arcadia Fund
PUBLISHERS
Daniel Shanahan, F1000 (Chair), Fiona Counsell, Taylor & Francis, Christina Gifford, Elsevier, Christina
Hoppermann, Springer Nature, Concetta La Spada, Cambridge University Press…
LIBRARIANS
Juliane Schneider, Harvard Catalyst (Chair), Christopher Erdmann, North Carolina State University, Ebe Kartus,
University of New England, Eva Mendez, UC3M/OSPP...
DATA PUBLISHERS AND REPOSITORIES
John Chodacki, CDL and DataCite (Chair), Barbara Chen, Modern Language Association, Jennifer Lin, Crossref, Scott
Plutchak, University of Alabama at Birmingham (retired)...
● Each group has met 5 times
● They have defined their community problem
statements, outlining challenges and opportunities
● Ideas that arose from multiple meetings are now
resulting in specific cross-community projects
Group Work
Problem Statements, Challenges & Opportunities
Example:
Researchers have a major issue with time. Metadata entry
upon submission of research takes time, and this metadata is
often required to be entered multiple times. Streamlining is
needed. Researchers in different fields have different metadata
needs and ways of talking about metadata. There is also a lack
of knowledge surrounding the importance of complete and
accurate metadata, and the value and uses of that metadata
upstream in the research product life cycle.
Projects 1-3
1. Researcher Communications: Increase the impact
and consistency of communication with researchers
about metadata
2. Metadata Recommendations and Element
Mappings: Shared set of recommended metadata
concepts/related mappings
3. Defining the Terms We Use About Metadata:
Develop a glossary of words associated with metadata,
for core concepts and disciplinary areas
Projects 4-6
4. Incentives for Improving Metadata Quality: Stories
to demonstrate how better metadata will meet
researcher goals
5. Shared Best Practices and Principles: High level best
practices for using metadata across the scholarly
communication cycle, to facilitate interoperability,
exchange
6. Metadata Evaluation and Guidance: Identify and
compare existing metadata evaluation tools and
mechanisms to inform clear community guidance
In our discussions...
Talks: SHARE & Dryad
Improving the metadata curation pipeline to SHARE
Judy Ruttenberg, Program Director for Strategic Initiatives, ARL
SHARE is a community open-source initiative developing tools and services to connected related, yet distributed,
research outputs, enabling new kinds of scholarly discovery. This talk will provide an overview of SHARE's current
development priorities to move to distributed, institutionally-based infrastructure supporting local priorities, as well
as critical improvements to SHARE's harvesting framework and metadata curation pipeline.
Dryad and the evolution of metadata curation at a generalist data
repository
Todd Vision, PI, Dryad
Dryad is a generalist data repository underlying the scientific and medical literature, with data underlying articles
from hundreds of journals and authors at hundreds of institutions. In this talk, I will describe how Dryad's workflow
for metadata curation has evolved over time and contemplate how institutions and data repositories might better
interface with one another and with the world of STM publishing.
Some feedback from NASIG so far...
Focus on identifiers: they keep coming up as the source of many problems,
specifically when they’re used either inconsistently or incorrectly.
Don’t start with single volume monographs and assume serials will fit in
eventually. Many existing data models start there and get too far along before
they realize that it doesn’t quite work with serials and then the serials community
is left trying to figure out how to work around the model/system.
Realize that the volume/proliferation of materials means that a lot of libraries,
large and small, rely (to varying degrees) on vendor-provided data. We need
them to take ownership of that data and work on ways to ensure at least accuracy
(identifiers, spelling, website urls, etc.).
*Thanks to Juliane Schneider & NASIG Metadata 2020 contributors
Can you help?
● Contribute to Metadata 2020 projects! Email
Clare Dean at cdean@metadata2020.org for details, or
sign up here.
● Help promote our efforts to the wider community
through your organizations, word of mouth, and social
media
● Find us on @Metadata2020 Twitter, Facebook,
LinkedIn, and at metadata2020.org
Metadata2020.org
@metadata2020
info@metadata2020.org
Thank you!
Questions?
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
SHARE is a community open-source initiative
developing tools and services to connect
related, yet distributed, research outputs,
enabling new kinds of scholarly discovery.
@SHARE_research
www.share-research.org
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Metadata is data
Rich metadata ...
● Facilitates discovery
● Exposes research assets
● Contributes to meta-scholarship and
meta-analysis
Links and relationships can be analyzed from
this data
Dataset
Harvesting Framework
Aggregator: OSF Preprints
Institutional focus: Dashboard
Lessons learned
Digital Humanities exploration
Dataset
Harvesting Framework
Aggregator: OSF Preprints
Institutional focus: Dashboard
Lessons learned
Digital Humanities exploration
Dataset
Harvesting Framework
Aggregator: OSF Preprints
Institutional focus: Dashboard
Lessons learned
Digital Humanities exploration
Dataset & Harvesting Framework
168+ data sources
● Registries (e.g. CrossRef, DataCite)
● Disciplinary repositories and preprint services
● Data repositories
● Institutional repositories
● Agency repositories (e.g. DOE SciTech Connect)
55+ million metadata records
https://share.osf.io/discover
SHARE metadata priorities
● Institutional identifier
● Person identifier
● Source of funding
● Exchange across systems & borders: CC0
● Reference lists
● URI values - mapping to common values
making them transferrable
Rich metadata, new discovery
Rich metadata, rich storytelling
Lessons learned
● Move to distributed infrastructure
● Invest more in relationship mapping among
objects in the dataset
● Build on work at the institution level
● Shared service AND reusable solutions
Decentralization of SHARE
Under development:
● Template to make writing harvester code
easy, using Node-RED
● Distributed framework for harvesting data
● Editor to clean, remediate, link harvested
data
Community, open-source software
development to solve local problems
Use case: Research Intelligence
“Aggregation, curation, and utilization of
metadata about research activities. [RIMs} …
help reliably connect a complex scholarly
communications landscape of researchers,
affiliations, publications, datasets, grants,
projects, and their persistent identifiers.”
OCLC Research Library Partnerships:
https://www.oclc.org/research/themes/research-collections/rim.html
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
VIVO - June 2018 - Durham, NC
The evolution of
metadata at a
generalist data
repository
Todd Vision
Associate Prof, Department of Biology
Adjunct, School of Information & Library Science
University of North Carolina at Chapel Hill
With thanks to
Dryad staff
Jane Greenberg, and the UNC/Drexel Metadata Research Center
The long tail of orphan dataVolume
Rank frequency of datatype
Specialized repositories
(e.g. GenBank)
Orphan data
After Heidorn (2008) http://hdl.handle.net/2142/9127
Bumpus HC (1898) The Elimination of the Unfit as
Illustrated by the Introduced Sparrow, Passer
domesticus. A Fourth Contribution to the Study of
Variation. pp. 209-226 in Biological Lectures from the
Marine Biological Laboratory, Woods Hole, Mass.
VIVO - June 2018 - Durham, NC
InformationContent
Time
Time of publication
Specific details
General details
Accident
Retirement or
career change
Death
Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford. 1997.
Non-geospatial metadata for the ecological sciences. Ecological Applications 7:330-342.
Data and metadata entropy
VIVO - June 2018 - Durham, NC
Joint Data Archiving Policy
Data are important products of the scientific enterprise,
and they should be preserved and usable for decades in
the future.
As a condition for publication, data supporting the results
in the article should be deposited in an appropriate
public archive.
Authors may elect to embargo access to the data for a
period up to a year after publication.
Exceptions may be granted at the discretion of the editor,
especially for sensitive information.
http://datadryad.org/pages/jdap
VIVO - June 2018 - Durham, NC
VIVO - June 2018 - Durham, NC
Integration of manuscript and data submission
VIVO - June 2018 - Durham, NC
A data “package”
VIVO - June 2018 - Durham, NC
Supplementary documentation
VIVO - June 2018 - Durham, NC
Interoperability
VIVO - June 2018 - Durham,
Interoperability
VIVO - June 2018 - Durham, NC
Interoperability
VIVO - June 2018 - Durham, NC
Data citation
VIVO - June 2018 - Durham, NC
Data Curation Network
Uncurated Data
Presenting scale and
expertise challenges
to individual
institutions
Curated Data
at scale through shared
Data Curation Network
Appraise
and Select
Ingest Preserve
Long-Term
Facilitate
Access
DCN
Review Assign CURATE Mediate Approve
Check files
and
metadata
Understand
and run files
Request
missing
information
Augment
metadata
Transform
file formats
Evaluate for
FAIRness
C U R A T E
The Data Curation Network
VIVO - June 2018 - Durham, NC
DCN – planning phase (2016-2017)
• Collaboration of six academic libraries
• Can data curation staff be shared among institutions?
• Questions
– How to address policy differences?
– What do researchers actually need help with?
– Will researchers care if curation is distributed?
– Can issues of trust and quality control be solved?
– What skills and workflows are needed?
Lisa Johnston et al. (2017) Data Curation Network: A Cross-Institutional
Staffing Model for Curating Research Data
http://hdl.handle.net/11299/188654
VIVO - June 2018 - Durham, NC
DCN - pilot phase (2018-2020)
VIVO - June 2018 - Durham, NC
Partnership with
VIVO - June 2018 - Durham, NC
Make Data Count
VIVO - June 2018 - Durham, NC
datadryad.org / @datadryad
datacurationnetwork.org

More Related Content

PPTX
Ziegler Open Data in Special Collections Libraries
PPTX
PPTX
Washington Linked Data Authority Service at University of Houston
PDF
McGeary Data Curation Network: Developing and Scaling
PDF
Introduction to linked data
PDF
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
PDF
Exploration, visualization and querying of linked open data sources
PDF
Trustworthy AI and Open Science
Ziegler Open Data in Special Collections Libraries
Washington Linked Data Authority Service at University of Houston
McGeary Data Curation Network: Developing and Scaling
Introduction to linked data
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Exploration, visualization and querying of linked open data sources
Trustworthy AI and Open Science

What's hot (20)

PDF
Think like a Digital Curator
PPTX
PDF
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPT
LIBER Webinar: 23 Things About Research Data Management
PDF
Preparing Data for Sharing: The FAIR Principles
PDF
PDF
dkNET ESP Meeting - February 2016
PDF
Big Data for Library Services (2017)
PDF
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
PPTX
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
PPTX
NISO Training Thursday Crafting a Scientific Data Management Plan
PPT
Fox-Keynote-Now and Now of Data Publishing-nfdp13
PPT
Data Citation, The Dataverse Network ®, and Contributor Identifiers
PDF
Integration of research literature and data (InFoLiS)
PDF
Data Repositories Impact
PDF
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
PDF
Mendeley Data FAIR hackathon
PPTX
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
PDF
Analysing & Improving Learning Resources Markup on the Web
Think like a Digital Curator
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
LIBER Webinar: 23 Things About Research Data Management
Preparing Data for Sharing: The FAIR Principles
dkNET ESP Meeting - February 2016
Big Data for Library Services (2017)
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Wor...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
NISO Training Thursday Crafting a Scientific Data Management Plan
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Data Citation, The Dataverse Network ®, and Contributor Identifiers
Integration of research literature and data (InFoLiS)
Data Repositories Impact
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
Mendeley Data FAIR hackathon
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
Analysing & Improving Learning Resources Markup on the Web
Ad

Similar to Full Erdmann Ruttenberg Community Approaches to Open Data at Scale (20)

PDF
Metadata 2020 Vivo Conference 2018
PPTX
Metadata En Croûte: How to make metadata more appetizing to decision makers
PPTX
PIDapalooza 2018: Metadata 2020
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PPTX
Metadata 2020 at APE 2018
PDF
Carpenter "The Future of the Scholarly Record"
PDF
Next-Generation Search Engines for Information Retrieval
PDF
Will Richer Metadata Rescue Research?
PPTX
The Research Object Initiative: Frameworks and Use Cases
PDF
6. Metadata 2020
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PPTX
Datashare cni spring2013
PPTX
Dataset Metadata, Tools and Approaches for Access and Preservation
PPTX
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
PPTX
Communicating with Data 2010 Annual Meeting
PPTX
The Data Management Ecosystem
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PDF
UKSG Conference 2017 Breakout - In the hands of many: how can you improve dis...
PPTX
RDAP13 John Kunze: The Data Management Ecosystem
PPTX
Metadata 2020: Projects
Metadata 2020 Vivo Conference 2018
Metadata En Croûte: How to make metadata more appetizing to decision makers
PIDapalooza 2018: Metadata 2020
DataCite and its Members: Connecting Research and Identifying Knowledge
Metadata 2020 at APE 2018
Carpenter "The Future of the Scholarly Record"
Next-Generation Search Engines for Information Retrieval
Will Richer Metadata Rescue Research?
The Research Object Initiative: Frameworks and Use Cases
6. Metadata 2020
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Datashare cni spring2013
Dataset Metadata, Tools and Approaches for Access and Preservation
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Communicating with Data 2010 Annual Meeting
The Data Management Ecosystem
FAIRy stories: the FAIR Data principles in theory and in practice
UKSG Conference 2017 Breakout - In the hands of many: how can you improve dis...
RDAP13 John Kunze: The Data Management Ecosystem
Metadata 2020: Projects
Ad

More from National Information Standards Organization (NISO) (20)

PPTX
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
PPTX
Potash "Our Journey & Vision for Accessible Content"
PPTX
O'Leary "Progress Assessment - How Far Are We from Delivery"
PPTX
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
PPTX
Davidian "Transfer Code of Practice Standing Committee Update"
PPTX
Patham "NISO Open Discovery Initiative (ODI) Update"
PPTX
Hichliffe "A Standard Terminology for Peer Review"
PPTX
Levin "KBART RP Update at ALA Annual 2025"
PPTX
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Carpenter "2025 NISO Annual Members Meeting"
PPTX
Allen "Social Marketing in Scholarly Communications"
PPTX
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
PDF
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
PPTX
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
PPTX
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
PPTX
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
PPTX
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...
Larry Bennett_ ALA Annual Convention 2025AL2 slides.pptx
Potash "Our Journey & Vision for Accessible Content"
O'Leary "Progress Assessment - How Far Are We from Delivery"
Carpenter and O'Leary "Accessibility Standards and the Future of Inclusive Pu...
Davidian "Transfer Code of Practice Standing Committee Update"
Patham "NISO Open Discovery Initiative (ODI) Update"
Hichliffe "A Standard Terminology for Peer Review"
Levin "KBART RP Update at ALA Annual 2025"
Carpenter "Advancing Infrastructure for Sustainable Collections: CCLP Project...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Carpenter "2025 NISO Annual Members Meeting"
Allen "Social Marketing in Scholarly Communications"
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Gibson "Secrets to Changing Behaviour in Scholarly Communication: A 2025 NISO...
Pfeiffer "Secrets to Changing Behavior in Scholarly Communication: A 2025 NIS...
Gilstrap "Accessibility Essentials: A 2025 NISO Training Series, Session 7, M...
Turner "Accessibility Essentials: A 2025 NISO Training Series, Session 7, Lan...
Comeford "Accessibility Essentials: A 2025 NISO Training Series, Session 7, A...
Laverick and Richard "Accessibility Essentials: A 2025 NISO Training Series, ...

Recently uploaded (20)

PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Pre independence Education in Inndia.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Pharma ospi slides which help in ospi learning
PDF
Insiders guide to clinical Medicine.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
master seminar digital applications in india
PDF
Business Ethics Teaching Materials for college
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
Microbial disease of the cardiovascular and lymphatic systems
VCE English Exam - Section C Student Revision Booklet
Pre independence Education in Inndia.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
102 student loan defaulters named and shamed – Is someone you know on the list?
Pharma ospi slides which help in ospi learning
Insiders guide to clinical Medicine.pdf
TR - Agricultural Crops Production NC III.pdf
Final Presentation General Medicine 03-08-2024.pptx
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
master seminar digital applications in india
Business Ethics Teaching Materials for college
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
O5-L3 Freight Transport Ops (International) V1.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Full Erdmann Ruttenberg Community Approaches to Open Data at Scale

  • 1. Chris Erdmann Judy Ruttenberg Todd Vision NISO Virtual Conference: Open Data Projects June 13, 2018 Community approaches to open data at scale
  • 2. Chris Erdmann The Carpentries/California Digital Library Metadata 2020 Participant @libcce / chris@carpentries.org Metadata 2020: Who, what, when, where, why?
  • 3. As a researcher…I’m a bit bloody fed up with Data Management - Cameron Neylon
  • 4. What is Metadata 2020? Metadata 2020 is a collaboration that advocates richer, connected, and reusable, open metadata for all research outputs, which will advance scholarly pursuits for the benefit of society.
  • 5. COMMUNITY GROUPS RESEARCHERS Cameron Neylon, Curtin (Chair), Bethany Drehman, FASEB, Ernesto Priego, University of London, Eva Mendez, UC3M/OSPP, Juan Pablo Alperin, Public Knowledge Project, L.K. Williams, Interfolio... SERVICE PROVIDER/PLATFORMS AND TOOLS Marianne Calilhanna, Cenveo Publisher Services (Chair), Adrian-Tudor Pănescu, Figshare, Bob Kasenchak, Access Innovations, Dan Nigloschy, XML workflow solutions architect... FUNDERS Ross Mounce, Arcadia Fund PUBLISHERS Daniel Shanahan, F1000 (Chair), Fiona Counsell, Taylor & Francis, Christina Gifford, Elsevier, Christina Hoppermann, Springer Nature, Concetta La Spada, Cambridge University Press… LIBRARIANS Juliane Schneider, Harvard Catalyst (Chair), Christopher Erdmann, North Carolina State University, Ebe Kartus, University of New England, Eva Mendez, UC3M/OSPP... DATA PUBLISHERS AND REPOSITORIES John Chodacki, CDL and DataCite (Chair), Barbara Chen, Modern Language Association, Jennifer Lin, Crossref, Scott Plutchak, University of Alabama at Birmingham (retired)...
  • 6. ● Each group has met 5 times ● They have defined their community problem statements, outlining challenges and opportunities ● Ideas that arose from multiple meetings are now resulting in specific cross-community projects Group Work
  • 7. Problem Statements, Challenges & Opportunities Example: Researchers have a major issue with time. Metadata entry upon submission of research takes time, and this metadata is often required to be entered multiple times. Streamlining is needed. Researchers in different fields have different metadata needs and ways of talking about metadata. There is also a lack of knowledge surrounding the importance of complete and accurate metadata, and the value and uses of that metadata upstream in the research product life cycle.
  • 8. Projects 1-3 1. Researcher Communications: Increase the impact and consistency of communication with researchers about metadata 2. Metadata Recommendations and Element Mappings: Shared set of recommended metadata concepts/related mappings 3. Defining the Terms We Use About Metadata: Develop a glossary of words associated with metadata, for core concepts and disciplinary areas
  • 9. Projects 4-6 4. Incentives for Improving Metadata Quality: Stories to demonstrate how better metadata will meet researcher goals 5. Shared Best Practices and Principles: High level best practices for using metadata across the scholarly communication cycle, to facilitate interoperability, exchange 6. Metadata Evaluation and Guidance: Identify and compare existing metadata evaluation tools and mechanisms to inform clear community guidance
  • 11. Talks: SHARE & Dryad Improving the metadata curation pipeline to SHARE Judy Ruttenberg, Program Director for Strategic Initiatives, ARL SHARE is a community open-source initiative developing tools and services to connected related, yet distributed, research outputs, enabling new kinds of scholarly discovery. This talk will provide an overview of SHARE's current development priorities to move to distributed, institutionally-based infrastructure supporting local priorities, as well as critical improvements to SHARE's harvesting framework and metadata curation pipeline. Dryad and the evolution of metadata curation at a generalist data repository Todd Vision, PI, Dryad Dryad is a generalist data repository underlying the scientific and medical literature, with data underlying articles from hundreds of journals and authors at hundreds of institutions. In this talk, I will describe how Dryad's workflow for metadata curation has evolved over time and contemplate how institutions and data repositories might better interface with one another and with the world of STM publishing.
  • 12. Some feedback from NASIG so far... Focus on identifiers: they keep coming up as the source of many problems, specifically when they’re used either inconsistently or incorrectly. Don’t start with single volume monographs and assume serials will fit in eventually. Many existing data models start there and get too far along before they realize that it doesn’t quite work with serials and then the serials community is left trying to figure out how to work around the model/system. Realize that the volume/proliferation of materials means that a lot of libraries, large and small, rely (to varying degrees) on vendor-provided data. We need them to take ownership of that data and work on ways to ensure at least accuracy (identifiers, spelling, website urls, etc.). *Thanks to Juliane Schneider & NASIG Metadata 2020 contributors
  • 13. Can you help? ● Contribute to Metadata 2020 projects! Email Clare Dean at cdean@metadata2020.org for details, or sign up here. ● Help promote our efforts to the wider community through your organizations, word of mouth, and social media ● Find us on @Metadata2020 Twitter, Facebook, LinkedIn, and at metadata2020.org
  • 16. SHARE is a community open-source initiative developing tools and services to connect related, yet distributed, research outputs, enabling new kinds of scholarly discovery. @SHARE_research www.share-research.org
  • 18. Metadata is data Rich metadata ... ● Facilitates discovery ● Exposes research assets ● Contributes to meta-scholarship and meta-analysis Links and relationships can be analyzed from this data
  • 19. Dataset Harvesting Framework Aggregator: OSF Preprints Institutional focus: Dashboard Lessons learned Digital Humanities exploration
  • 20. Dataset Harvesting Framework Aggregator: OSF Preprints Institutional focus: Dashboard Lessons learned Digital Humanities exploration
  • 21. Dataset Harvesting Framework Aggregator: OSF Preprints Institutional focus: Dashboard Lessons learned Digital Humanities exploration
  • 22. Dataset & Harvesting Framework 168+ data sources ● Registries (e.g. CrossRef, DataCite) ● Disciplinary repositories and preprint services ● Data repositories ● Institutional repositories ● Agency repositories (e.g. DOE SciTech Connect) 55+ million metadata records https://share.osf.io/discover
  • 23. SHARE metadata priorities ● Institutional identifier ● Person identifier ● Source of funding ● Exchange across systems & borders: CC0 ● Reference lists ● URI values - mapping to common values making them transferrable
  • 24. Rich metadata, new discovery
  • 25. Rich metadata, rich storytelling
  • 26. Lessons learned ● Move to distributed infrastructure ● Invest more in relationship mapping among objects in the dataset ● Build on work at the institution level ● Shared service AND reusable solutions
  • 27. Decentralization of SHARE Under development: ● Template to make writing harvester code easy, using Node-RED ● Distributed framework for harvesting data ● Editor to clean, remediate, link harvested data Community, open-source software development to solve local problems
  • 28. Use case: Research Intelligence “Aggregation, curation, and utilization of metadata about research activities. [RIMs} … help reliably connect a complex scholarly communications landscape of researchers, affiliations, publications, datasets, grants, projects, and their persistent identifiers.” OCLC Research Library Partnerships: https://www.oclc.org/research/themes/research-collections/rim.html
  • 30. VIVO - June 2018 - Durham, NC The evolution of metadata at a generalist data repository Todd Vision Associate Prof, Department of Biology Adjunct, School of Information & Library Science University of North Carolina at Chapel Hill With thanks to Dryad staff Jane Greenberg, and the UNC/Drexel Metadata Research Center
  • 31. The long tail of orphan dataVolume Rank frequency of datatype Specialized repositories (e.g. GenBank) Orphan data After Heidorn (2008) http://hdl.handle.net/2142/9127 Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. A Fourth Contribution to the Study of Variation. pp. 209-226 in Biological Lectures from the Marine Biological Laboratory, Woods Hole, Mass. VIVO - June 2018 - Durham, NC
  • 32. InformationContent Time Time of publication Specific details General details Accident Retirement or career change Death Michener, W. K., J. W. Brunt, J. Helly, T. B. Kirchner, and S. G. Stafford. 1997. Non-geospatial metadata for the ecological sciences. Ecological Applications 7:330-342. Data and metadata entropy VIVO - June 2018 - Durham, NC
  • 33. Joint Data Archiving Policy Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive. Authors may elect to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information. http://datadryad.org/pages/jdap VIVO - June 2018 - Durham, NC
  • 34. VIVO - June 2018 - Durham, NC
  • 35. Integration of manuscript and data submission VIVO - June 2018 - Durham, NC
  • 36. A data “package” VIVO - June 2018 - Durham, NC
  • 37. Supplementary documentation VIVO - June 2018 - Durham, NC
  • 39. Interoperability VIVO - June 2018 - Durham, NC
  • 40. Interoperability VIVO - June 2018 - Durham, NC
  • 41. Data citation VIVO - June 2018 - Durham, NC
  • 42. Data Curation Network Uncurated Data Presenting scale and expertise challenges to individual institutions Curated Data at scale through shared Data Curation Network Appraise and Select Ingest Preserve Long-Term Facilitate Access DCN Review Assign CURATE Mediate Approve Check files and metadata Understand and run files Request missing information Augment metadata Transform file formats Evaluate for FAIRness C U R A T E The Data Curation Network VIVO - June 2018 - Durham, NC
  • 43. DCN – planning phase (2016-2017) • Collaboration of six academic libraries • Can data curation staff be shared among institutions? • Questions – How to address policy differences? – What do researchers actually need help with? – Will researchers care if curation is distributed? – Can issues of trust and quality control be solved? – What skills and workflows are needed? Lisa Johnston et al. (2017) Data Curation Network: A Cross-Institutional Staffing Model for Curating Research Data http://hdl.handle.net/11299/188654 VIVO - June 2018 - Durham, NC
  • 44. DCN - pilot phase (2018-2020) VIVO - June 2018 - Durham, NC
  • 45. Partnership with VIVO - June 2018 - Durham, NC Make Data Count
  • 46. VIVO - June 2018 - Durham, NC datadryad.org / @datadryad datacurationnetwork.org