SlideShare a Scribd company logo
Joint Declaration of Data Citation
Principles
© 2015 Massachusetts General Hospital
and FORCE11.org
Tim Clark, Ph.D.
Assistant Professor of Neurology
Massachusetts General Hospital & Harvard Medical School
June 9, 2015
reproducibility crisis
Data Citation Implementation Guidelines By Tim Clark
Non-reproduciblity
11%
Begley CG and Ellis LM, Nature 2012, 483(7391):531-533
Transparency and
Reproducibility
• Transparency is the basis of reproducibility
• What we are aiming for is robust science
• Validation from multiple orthogonal viewpoints
• Focus on transparent communication of results
Joint Declaration of Data Citation Principles
endorsed by over 90 scholarly organizations
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
The Brief JDDCP
1. Importance. Data are first-
class objects.
2. Credit. Support citing all
contributors to the data.
3. Evidence. Assertions must
be traceable to evidence.
4. Unique ID. Cited datasets
must have resolvable IDs.
5. Access. Data must be
robustly archived.
6. Persistence. Metadata must
persist even after data is gone.
7. Specificity & Verifiability.
Get same dynamic time-slice.
8. Interoperable & flexible.
Give cross-community support.
How to implement
JDDCP?
Data Citation Implementation Guidelines By Tim Clark
JDDCP
Archival, id
& retrieval
Document
model
Archival &
retrieval
Archival &
retrieval
Identification
Common
APIs
Workflows
Metadata
repositories
social science
biomedicine
earth science
climatology
scholarly publishing
scholarly publishing
web standards
scientific data standards
astronomy
scholarly publishing
physics
academic libraries
data science
software technology
physics
scholarly publishing
biomedicine
Archival &
retrieval
Human and machine accessibility of
cited data in scholarly publications
© 2015 Massachusetts General Hospital
and FORCE11.org
Tim Clark, Ph.D.
Assistant Professor of Neurology
Massachusetts General Hospital & Harvard Medical School
June 9, 2015
or, how to store and access cited
data to radically improve scholarly
transparency - and so that BOTH
humans and machines are happy.
PeerJ Computer Science 1:e1. https://dx.doi.org/10.7717/peerj-cs.1
Basic guidelines
1. Cite data as you would cite publications.
2. Deposit data in an archival-quality repository.
3. Use an identifier scheme meeting JDDCP
criteria.
4. Identifiers should resolve to a landing page,
not directly to the data.
5. Landing pages describe the data in both
human and machine readable form.
Basic guidelines (contd.)
6. Landing page & data retention may differ.
7. Repositories should provide specific
guarantee of landing page persistence.
8. Landing pages should provide both human
and machine interpretable information.
9. Provide web service accessibility.
10. Stakeholder responsibilities for ecosystem.
1. Cite data as you would
cite publications
• Strongly preferred:
• Use the NISO JATS revision 1.1d2 XML schema
• Interim (less good) alternative:
• Use own XML schema, but do what JATS does.
2. Deposit data in archival
quality repositories
Examples:
• NIH and EBI bioscience repositories;
• Standard earth/space/physical science repositories;
• Dataverse, Dryad, Figshare, Zenodo; etc.
Unacceptable:
• “Available on my laboratory website”.
3. Use an ID scheme that meets
JDDCP criteria (4-6)
Any currently‐available identifier scheme that is:
• Machine actionable,
• Globally unique,
• Widely used by a community, and
• Has a long term commitment to persistence
Best practice:
• use a scheme that is cross-discipline, such as
DOI.
Machine accessibility
Machine accessibility in this context means:
“access by well-documented Web services—preferably
RESTful Web services—to data and metadata stored in
a robust repository, independently of integrated browser
access by humans.”
Commitment to persistence
If a resolving authority is required, that authority has
demonstrated a reasonable chance to be present and
functional in the future;
Owner of the domain or the resolving authority has
made a credible commitment to ensure that its
identifiers will always resolve.
A useful survey of persistent identifier schemes
appears in Hilse & Kothe (2006).
• Digital Object Identifiers (DOIs)
4. Identifiers should resolve to a
landing page, not directly to data
Because:
• Data may be de-accessioned, like books, but
the description of thing cited should remain;
• Data may be restricted (e.g. Protected Health
Information; specially-licensed data; etc.);
• Data may be VERY large and user needs to
be able to decide whether to download or not.
• Content negotiation for machine access!
5. Landing pages describe the data
Best practices:
• Identifier, title, description, creator,
publisher/contact, publication/release date,
version.
Additional:
• Creator identifier (e.g. ORCID), license
Content encoding:
• HTML; plus…
• At least one non-proprietary machine-readable
format, e.g. XML, JSON/JSON-LD, RDF,
microformats, microdata, RDFa,…
Serving the landing pages
“To enable automated agents to extract the metadata
these landing pages should include an HTML <link>
element specifying a machine readable form of the
page as an alternative.”
“For those that are capable of doing so, we
recommend also using Web Linking (Nottingham,
2010) to provide this information from all of the
alternative formats.”
6. Landing page retention may differ
from data retention
Because:
• Repositories cannot commit to keeping
arbitrary and possibly very large volumes of
data forever!
• But when data is de-accessioned, the citation
identifier must not give a 404 error.
• Retain awareness of what was cited even if it
is not currently extant in a particular repository.
7. Repositories should provide a
specific guarantee of persistence for
landing pages
Model guarantee language:
“[Organization/Institution Name] is committed to maintaining
persistent identifiers in [Repository Name] so that they will
continue to resolve to a landing page providing metadata
describing the data, including elements of stewardship,
provenance, and availability.
[Organization/Institution Name] has made the following plan
for organizational persistence and succession [plan]
8. Landing pages should provide
both human and machine
interpretable information.
Because:
• Mash-ups and distributed search.
• Apps that you haven’t yet thought of.
• Web services.
Examples of machine interpretable info:
•.RDF, RDFa, XML, microformats, JSON-LD,
etc.
9. Provide web service accessibility
Because:
• Service composition, new apps, etc.
Best practice:
•.RESTful web service, because this is a data-
oriented application and required functionality.
Much less good practice:
• SOAP, because SOAP is process-oriented.
10. Stakeholder
responsibilities
• Archives and repositories: Ids, resolution, landing
page metadata, dataset description, data access
methods conform to these recommendations.
• Registries of repositories: Document conformance.
• Researchers: Treat data as first-class objects.
• Funders, scholarly societies, academic institutions:
Strongly encourage conformance to best practices.
Summary
• Use NISO JATS 1.1d2 to publish & archive documents.
• Cite datasets as if they were publications and deposit
datasets in archival repositories.
• Follow human & machine accessibility guidelines as
presented above in points 3 through 9.
• Adhere to stakeholder responsibilities as in point 10.
• Welcome to the future of scholarly publishing!
Acknowledgements
• Joan Starr, California Digital Library
• other co-authors of the “Achieving Human and Machine
Accessibility” publication
• FORCE11 Data Citation Implementation Group
• Maryann Martone, UCSD & FORCE11
• John Kunze, California Digital Library
• Harry Hochheiser, University of Pittsburgh
• Phil Bourne, NIH Data Science Directorate
Questions?

More Related Content

PDF
Data Publishing Models by Sünje Dallmeier-Tiessen
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
PDF
Metadata & Data Curation Services by Thu-Mai Christian
PDF
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
PPTX
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
PPTX
Implementing Archivematica, research data network
PDF
Levine - Data Curation; Ethics and Legal Considerations
Data Publishing Models by Sünje Dallmeier-Tiessen
Dataverse in the Universe of Data by Christine L. Borgman
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
Metadata & Data Curation Services by Thu-Mai Christian
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
Implementing Archivematica, research data network
Levine - Data Curation; Ethics and Legal Considerations

What's hot (20)

PDF
Persistent Identifier Services and their Metadata by John Kunze
PPTX
DataONE Education Module 10: Legal and Policy Issues
PPTX
DataONE Education Module 02: Data Sharing
PPTX
THOR Workshop - Persistent Identifier Linking
PPTX
Research Data Management: Why is it important?
PPTX
Overcoming obstacles to sharing data about human subjects
PDF
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
PPT
Organising and Documenting Data
PPT
Altman RDAP11 Policy-based Data Management
PPTX
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
PPTX
Next generation data services at the Marriott Library
PPTX
Data Management Planning for researchers
PPTX
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
PPTX
Publishing perspectives on data management & future directions
PPTX
Managing and sharing data
PPTX
Research Data Management
PPT
Rots RDAP11 Data Archives in Federal Agencies
PDF
Preparing Data for Sharing: The FAIR Principles
PDF
ANDS and Data Management
PPTX
DataONE Education Module 08: Data Citation
Persistent Identifier Services and their Metadata by John Kunze
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 02: Data Sharing
THOR Workshop - Persistent Identifier Linking
Research Data Management: Why is it important?
Overcoming obstacles to sharing data about human subjects
Dataverse in China: Internationalization, Curation and Promotion by Yin Shenqin
Organising and Documenting Data
Altman RDAP11 Policy-based Data Management
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
Next generation data services at the Marriott Library
Data Management Planning for researchers
2017 05 03 Implementing Pure at UWA - ANDS Webinar Series
Publishing perspectives on data management & future directions
Managing and sharing data
Research Data Management
Rots RDAP11 Data Archives in Federal Agencies
Preparing Data for Sharing: The FAIR Principles
ANDS and Data Management
DataONE Education Module 08: Data Citation
Ad

Similar to Data Citation Implementation Guidelines By Tim Clark (20)

PPTX
Ten Habits of Highly Successful Data
PPTX
Ten Habits of Highly Effective Data
PPTX
Ten habits of highly effective data
PPTX
The habits of highly successful data:
PPTX
NSF Data Policies webcast February 29, 2012
PPTX
Meeting the NSF DMP Requirement: March 7, 2012
PPTX
Meeting the NSF DMP Requirement June 13, 2012
PPT
A Data Citation Roadmap for Scholarly Data Repositories
PPT
PhRMA Some Early Thoughts
PDF
2012 Fall Data Management Planning Workshop
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PPTX
Guidelines for OSTP Data Access Plans
PPTX
Paving the way to open and interoperable research data service workflows Prog...
PPTX
Data commons bonazzi bd2 k fundamentals of science feb 2017
PPTX
Introduction to Data Management
PPT
Data management plans
PPTX
Data management plans archeology class 10 18 2012
PPTX
Repository Federation: Towards Data Interoperability
PDF
Carpenter "The Future of the Scholarly Record"
PPTX
Demography pro sem
Ten Habits of Highly Successful Data
Ten Habits of Highly Effective Data
Ten habits of highly effective data
The habits of highly successful data:
NSF Data Policies webcast February 29, 2012
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement June 13, 2012
A Data Citation Roadmap for Scholarly Data Repositories
PhRMA Some Early Thoughts
2012 Fall Data Management Planning Workshop
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Guidelines for OSTP Data Access Plans
Paving the way to open and interoperable research data service workflows Prog...
Data commons bonazzi bd2 k fundamentals of science feb 2017
Introduction to Data Management
Data management plans
Data management plans archeology class 10 18 2012
Repository Federation: Towards Data Interoperability
Carpenter "The Future of the Scholarly Record"
Demography pro sem
Ad

More from datascienceiqss (18)

PDF
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
PDF
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
PDF
iRODS/Dataverse Project by Jonathan Crabtree
PDF
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
PDF
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
PDF
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
PDF
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
PDF
Sharing Data Through Plots with Plotly by Alex Johnson
PDF
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
PDF
MIT Libraries Dataverse by Katherine McNeill
PDF
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
PDF
American Journal of Political Science & The Odum Institute: Promoting Researc...
PDF
Political Analysis Dataverse by Jonathan N. Katz
PDF
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
PDF
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
PDF
Contributing Code to Dataverse by Gustavo Durand
PDF
Dataverse 4.0 UX by Elizabeth Quigley
PDF
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...
Citing Data in Journal Articles using JATS by Deborah A. Lapeyre
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
iRODS/Dataverse Project by Jonathan Crabtree
DataTags: Sharing Privacy Sensitive Data by Latanya Sweeney
Center for Open Science and the Open Science Framework: Dataverse Add-on by S...
Data Analysis in Dataverse & Visualization of Datasets on Historical Maps by ...
Geospatial Data Visualization: WorldMap Integration by Raman Prasad
Sharing Data Through Plots with Plotly by Alex Johnson
TwoRavens: A Graphical, Browser-Based Statistical Interface for Data Reposito...
MIT Libraries Dataverse by Katherine McNeill
The Project TIER Dataverse: Archiving and Sharing Replicable Student Research...
American Journal of Political Science & The Odum Institute: Promoting Researc...
Political Analysis Dataverse by Jonathan N. Katz
Data in Brief and Dataverse: Incentivizing Authors to Share Data by Paige Sha...
Data FAIRport Skunkworks: Common Repository Access Via Meta-Metadata Descript...
Contributing Code to Dataverse by Gustavo Durand
Dataverse 4.0 UX by Elizabeth Quigley
Towards a common deposit api (the dataverse example) Elizabeth Quigley + Phil...

Recently uploaded (20)

PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Institutional Correction lecture only . . .
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Insiders guide to clinical Medicine.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Microbial diseases, their pathogenesis and prophylaxis
102 student loan defaulters named and shamed – Is someone you know on the list?
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Module 4: Burden of Disease Tutorial Slides S2 2025
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Basic Mud Logging Guide for educational purpose
Week 4 Term 3 Study Techniques revisited.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Abdominal Access Techniques with Prof. Dr. R K Mishra
Anesthesia in Laparoscopic Surgery in India
Institutional Correction lecture only . . .
Final Presentation General Medicine 03-08-2024.pptx
Insiders guide to clinical Medicine.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Microbial diseases, their pathogenesis and prophylaxis

Data Citation Implementation Guidelines By Tim Clark

  • 1. Joint Declaration of Data Citation Principles © 2015 Massachusetts General Hospital and FORCE11.org Tim Clark, Ph.D. Assistant Professor of Neurology Massachusetts General Hospital & Harvard Medical School June 9, 2015
  • 4. Non-reproduciblity 11% Begley CG and Ellis LM, Nature 2012, 483(7391):531-533
  • 5. Transparency and Reproducibility • Transparency is the basis of reproducibility • What we are aiming for is robust science • Validation from multiple orthogonal viewpoints • Focus on transparent communication of results
  • 6. Joint Declaration of Data Citation Principles endorsed by over 90 scholarly organizations
  • 9. The Brief JDDCP 1. Importance. Data are first- class objects. 2. Credit. Support citing all contributors to the data. 3. Evidence. Assertions must be traceable to evidence. 4. Unique ID. Cited datasets must have resolvable IDs. 5. Access. Data must be robustly archived. 6. Persistence. Metadata must persist even after data is gone. 7. Specificity & Verifiability. Get same dynamic time-slice. 8. Interoperable & flexible. Give cross-community support.
  • 12. JDDCP Archival, id & retrieval Document model Archival & retrieval Archival & retrieval Identification Common APIs Workflows Metadata
  • 13. repositories social science biomedicine earth science climatology scholarly publishing scholarly publishing web standards scientific data standards astronomy scholarly publishing physics academic libraries data science software technology physics scholarly publishing biomedicine Archival & retrieval
  • 14. Human and machine accessibility of cited data in scholarly publications © 2015 Massachusetts General Hospital and FORCE11.org Tim Clark, Ph.D. Assistant Professor of Neurology Massachusetts General Hospital & Harvard Medical School June 9, 2015
  • 15. or, how to store and access cited data to radically improve scholarly transparency - and so that BOTH humans and machines are happy.
  • 16. PeerJ Computer Science 1:e1. https://dx.doi.org/10.7717/peerj-cs.1
  • 17. Basic guidelines 1. Cite data as you would cite publications. 2. Deposit data in an archival-quality repository. 3. Use an identifier scheme meeting JDDCP criteria. 4. Identifiers should resolve to a landing page, not directly to the data. 5. Landing pages describe the data in both human and machine readable form.
  • 18. Basic guidelines (contd.) 6. Landing page & data retention may differ. 7. Repositories should provide specific guarantee of landing page persistence. 8. Landing pages should provide both human and machine interpretable information. 9. Provide web service accessibility. 10. Stakeholder responsibilities for ecosystem.
  • 19. 1. Cite data as you would cite publications • Strongly preferred: • Use the NISO JATS revision 1.1d2 XML schema • Interim (less good) alternative: • Use own XML schema, but do what JATS does.
  • 20. 2. Deposit data in archival quality repositories Examples: • NIH and EBI bioscience repositories; • Standard earth/space/physical science repositories; • Dataverse, Dryad, Figshare, Zenodo; etc. Unacceptable: • “Available on my laboratory website”.
  • 21. 3. Use an ID scheme that meets JDDCP criteria (4-6) Any currently‐available identifier scheme that is: • Machine actionable, • Globally unique, • Widely used by a community, and • Has a long term commitment to persistence Best practice: • use a scheme that is cross-discipline, such as DOI.
  • 22. Machine accessibility Machine accessibility in this context means: “access by well-documented Web services—preferably RESTful Web services—to data and metadata stored in a robust repository, independently of integrated browser access by humans.”
  • 23. Commitment to persistence If a resolving authority is required, that authority has demonstrated a reasonable chance to be present and functional in the future; Owner of the domain or the resolving authority has made a credible commitment to ensure that its identifiers will always resolve. A useful survey of persistent identifier schemes appears in Hilse & Kothe (2006).
  • 24. • Digital Object Identifiers (DOIs)
  • 25. 4. Identifiers should resolve to a landing page, not directly to data Because: • Data may be de-accessioned, like books, but the description of thing cited should remain; • Data may be restricted (e.g. Protected Health Information; specially-licensed data; etc.); • Data may be VERY large and user needs to be able to decide whether to download or not. • Content negotiation for machine access!
  • 26. 5. Landing pages describe the data Best practices: • Identifier, title, description, creator, publisher/contact, publication/release date, version. Additional: • Creator identifier (e.g. ORCID), license Content encoding: • HTML; plus… • At least one non-proprietary machine-readable format, e.g. XML, JSON/JSON-LD, RDF, microformats, microdata, RDFa,…
  • 27. Serving the landing pages “To enable automated agents to extract the metadata these landing pages should include an HTML <link> element specifying a machine readable form of the page as an alternative.” “For those that are capable of doing so, we recommend also using Web Linking (Nottingham, 2010) to provide this information from all of the alternative formats.”
  • 28. 6. Landing page retention may differ from data retention Because: • Repositories cannot commit to keeping arbitrary and possibly very large volumes of data forever! • But when data is de-accessioned, the citation identifier must not give a 404 error. • Retain awareness of what was cited even if it is not currently extant in a particular repository.
  • 29. 7. Repositories should provide a specific guarantee of persistence for landing pages Model guarantee language: “[Organization/Institution Name] is committed to maintaining persistent identifiers in [Repository Name] so that they will continue to resolve to a landing page providing metadata describing the data, including elements of stewardship, provenance, and availability. [Organization/Institution Name] has made the following plan for organizational persistence and succession [plan]
  • 30. 8. Landing pages should provide both human and machine interpretable information. Because: • Mash-ups and distributed search. • Apps that you haven’t yet thought of. • Web services. Examples of machine interpretable info: •.RDF, RDFa, XML, microformats, JSON-LD, etc.
  • 31. 9. Provide web service accessibility Because: • Service composition, new apps, etc. Best practice: •.RESTful web service, because this is a data- oriented application and required functionality. Much less good practice: • SOAP, because SOAP is process-oriented.
  • 32. 10. Stakeholder responsibilities • Archives and repositories: Ids, resolution, landing page metadata, dataset description, data access methods conform to these recommendations. • Registries of repositories: Document conformance. • Researchers: Treat data as first-class objects. • Funders, scholarly societies, academic institutions: Strongly encourage conformance to best practices.
  • 33. Summary • Use NISO JATS 1.1d2 to publish & archive documents. • Cite datasets as if they were publications and deposit datasets in archival repositories. • Follow human & machine accessibility guidelines as presented above in points 3 through 9. • Adhere to stakeholder responsibilities as in point 10. • Welcome to the future of scholarly publishing!
  • 34. Acknowledgements • Joan Starr, California Digital Library • other co-authors of the “Achieving Human and Machine Accessibility” publication • FORCE11 Data Citation Implementation Group • Maryann Martone, UCSD & FORCE11 • John Kunze, California Digital Library • Harry Hochheiser, University of Pittsburgh • Phil Bourne, NIH Data Science Directorate