SlideShare a Scribd company logo
Citation and Research Objects:
Toward Active Research Objects
Research Objects 2019,
eScience 2019, 24 September 2019
Daniel S. Katz
(d.katz@ieee.org, http://danielskatz.org, @danielskatz)
Assistant Director for Scientific
Software & Applications, NCSA
Research Associate Professor,
CS, ECE, iSchool
https://doi.org/10.5281/zenodo.3338176
Definitions
• Simple research object
• Small unit of citable work
• E.g., paper, dataset, version of software, etc.
• Complex research object
• Collection of multiple simple research object
• E.g., Research Objects as thought of in this workshop
Citing simple research objects
• Recent progress in creating principles for citing simple research
objects, such as data [1] and software [2]
• Different principles because these are fundamentally different objects [3]
• More recently, community efforts to implement these citation
principles
• Data: FORCE11 Data Citation Implementation Group
• Software: FORCE11 Software Citation Implementation Working Group
• Data (in the context of the FAIR data principles): Enabling FAIR Data [4]
• Note: There’s no widely-accepted equivalent of FAIR data
principles for software or for other research objects, though some
researchers are working in this area
[1] Joint declaration of data citation principles https://doi.org/10.25490/a97f-egyk
[2] Software citation principles https://doi.org/10.7717/peerj-cs.86
[3] Software vs. data in the context of citation https://doi.org/10.7287/peerj.preprints.2630v1
[4] The FAIR guiding principles for scientific data management and stewardship
https://doi.org/10.1038/sdata.2016.18
How to cite simple research objects
• Follow the example of long-established method for citing papers:
1. Deposit item (data, software) and associated metadata in an archival
repository
• Possible peer-review or repository checks
2. The repository (aka publisher) stores/archives the item and metadata;
provides an identifier that can be used to retrieve them
3. Identifier and metadata are used to cite the object
Citing complex research objects?
• Complex research objects are objects that contain other objects,
e.g., “Research Objects”
• What could be cited?
• Entire complex object (as a single entity)
• Some of the contained objects (which may already have identifiers)
• Both
• How to cite?
• Two proposals follow
Basic citation of complex research objects
• Proposal 1: Treat complex research object as a container and a set of contents & cite both
complex research object and all the contained objects that were used
• FORCE11 Software Citation Implementation Working Group recently defined some
challenges [5]
• One is how to cite complex software objects, namely frameworks that include components
• A framework can have lots of components
• Only some components are used in a particular research project
• So a set of citations for that project should cite the framework and the components that were used
• Citation of Research Objects (ROs) [6] is similar
• RO itself should be cited, plus objects in RO that are used, not those that are not
• Citing objects in the RO can then be handled similarly to how those objects outside an RO would be
cited, whether they are data, software, or something else
• Note: this relies on separability of objects; not the case for some complex research objects, e.g.,
Jupyter Notebooks, where all the software, data, and text are bundled in such a way that they cannot
be separated and individually cited
[5] Software citation implementation challenges http://arxiv.org/abs/1905.08674
[6] Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
How to cite complex research objects
• The necessary steps are thus:
1. Tracking what parts of the RO was used (both the RO itself and the objects within it)
2. Finding identifiers & other metadata for the RO and its objects that were used
3. Building correctly formatted citations for the RO and its objects that were used
• Step 1 is the greatest challenge
• With current Research Objects, this must be done outside the RO, either manually or by
tools that use the RO (e.g., an electronic notebook system)
• For Steps 2 and 3
• Cite the RO as a data object; follow data citation principles
• Cite software, data, and documentation objects in an RO as you would for any software
or data objects or papers
• Contents may have identifiers already based on their existence outside the RO, or they
can be given identifiers when the RO is given an identifier, with suitable relationship
metadata between the RO and the content
Active research objects and citation
• Move beyond current Research Objects to automatically track usage of object inside ROs
• As stated on http://www.researchobject.org: “Enriching these resources and collections with any and all additional
information required to make research reusable, and reproducible!”
• Proposal 2: Active Research Objects (AROs), adds internal data and methods to the RO
• Basic ARO methods: put() and get() to place and access the object within the ARO
• put() requires data beyond the object being placed
• Data currently required by many ROs, including description, checksum, etc.
• External identifier (DOI) and a citation
• Perhaps also internal identifier (e.g., IDO [7])
• get() tracks when an object is accessed
• ARO data includes: flags for each internal object
• Initially set to false when object is put
• Set to true when then object is accessed via get()
• Next ARO method: validate() method to provide fixity
• Final ARO method: citation(), similar to the citation method in R [8], except can be used to obtain
citation for whole RO, citations for RO and internal objects that have been used, or citation for one
specific internal object
[7] Identifiers for Digital Objects: the Case of Software Source Code Preservation https://hal.archives-ouvertes.fr/hal-01865790
[8] Citing R https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R
Acknowledgements
• Prior support from NIH Data Commons Pilot
Program Consortium (DCPPC) via Harvard as
part of Team Sodium
• Thanks!
• Questions?
Citation and Research Objects: Toward Active Research Objects

More Related Content

PPT
Tovek Presentation by Livio Costantini
PPTX
CNI 2018: A Research Object Authoring Tool for the Data Commons
PDF
Tutorial 5 (lucene)
 
PPT
Tovek Presentation 2 by Livio Costantini
PDF
Full Text Search with Lucene
PPT
Searching Keyword-lacking Files based on Latent Interfile Relationships
PPT
Aggregation for searching complex information spaces
PPTX
Data and Donuts: Data organization
Tovek Presentation by Livio Costantini
CNI 2018: A Research Object Authoring Tool for the Data Commons
Tutorial 5 (lucene)
 
Tovek Presentation 2 by Livio Costantini
Full Text Search with Lucene
Searching Keyword-lacking Files based on Latent Interfile Relationships
Aggregation for searching complex information spaces
Data and Donuts: Data organization

What's hot (20)

PPTX
Search Me: Using Lucene.Net
PDF
Text Indexing and Retrieval
PPTX
Liberating Laboratory Data - Eureka
PPTX
Data Archiving and Sharing
PPTX
Introduction to Information Retrieval
PPTX
Lucene
PPT
Web search engines
PDF
Tutorial 1 (information retrieval basics)
 
PDF
Context Based Web Indexing For Semantic Web
PPT
Open Annotation Collaboration Briefing
PDF
Data Management Lab: Session 2 slides
 
PDF
Cloud Storage Client Application Analysis
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPTX
Software Sustainability: Better Software Better Science
PPTX
Metadata for Research Objects
 
PPT
Apache Tika: 1 point Oh!
PPTX
Tdm information retrieval
PPT
Annotating Digital Texts in the Brown University Library
PDF
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
PDF
WoSC19: Serverless Workflows for Indexing Large Scientific Data
Search Me: Using Lucene.Net
Text Indexing and Retrieval
Liberating Laboratory Data - Eureka
Data Archiving and Sharing
Introduction to Information Retrieval
Lucene
Web search engines
Tutorial 1 (information retrieval basics)
 
Context Based Web Indexing For Semantic Web
Open Annotation Collaboration Briefing
Data Management Lab: Session 2 slides
 
Cloud Storage Client Application Analysis
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Software Sustainability: Better Software Better Science
Metadata for Research Objects
 
Apache Tika: 1 point Oh!
Tdm information retrieval
Annotating Digital Texts in the Brown University Library
Introducing the Whole Tale Project: Merging Science and Cyberinfrastructure P...
WoSC19: Serverless Workflows for Indexing Large Scientific Data
Ad

Similar to Citation and Research Objects: Toward Active Research Objects (20)

PDF
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
PPTX
FAIR Workflows and Research Objects get a Workout
PPTX
Fsci 2018 friday3_august_am6
 
PPTX
data citation
PPTX
Scientific data management from the lab to the web
PPTX
THOR Workshop - Introduction
PPTX
Data Exchange, Data Citation: An overview of some community work
PDF
DataCite and its Members: Connecting Research and Identifying Knowledge
PPTX
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PPTX
Data Exchange, Data Citation: An overview of some community work
PPTX
Research software identification - Catherine Jones
PPTX
The habits of highly successful data:
PPTX
RO-Crate: packaging metadata love notes into FAIR Digital Objects
PDF
Research Objects in Wf4Ever
PPTX
Thoughts on Knowledge Graphs & Deeper Provenance
PPTX
The Research Object Initiative: Frameworks and Use Cases
PPTX
Ten habits of highly effective data
PDF
Workflow Preservation
PDF
Supporting Data-Rich Research on Many Fronts
PDF
Introducción al Análisis y Diseño Orientado a Objetos
Introduction to Research Objects - Collaboartions Workshop 2015, Oxford
FAIR Workflows and Research Objects get a Workout
Fsci 2018 friday3_august_am6
 
data citation
Scientific data management from the lab to the web
THOR Workshop - Introduction
Data Exchange, Data Citation: An overview of some community work
DataCite and its Members: Connecting Research and Identifying Knowledge
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
Data Exchange, Data Citation: An overview of some community work
Research software identification - Catherine Jones
The habits of highly successful data:
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Research Objects in Wf4Ever
Thoughts on Knowledge Graphs & Deeper Provenance
The Research Object Initiative: Frameworks and Use Cases
Ten habits of highly effective data
Workflow Preservation
Supporting Data-Rich Research on Many Fronts
Introducción al Análisis y Diseño Orientado a Objetos
Ad

More from Daniel S. Katz (20)

PDF
Research software susainability
PPTX
Software Professionals (RSEs) at NCSA
PPTX
Parsl: Pervasive Parallel Programming in Python
PPTX
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
PPTX
What is eScience, and where does it go from here?
PDF
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
PPTX
Fundamentals of software sustainability
PPTX
Software Citation in Theory and Practice
PPTX
PDF
Research Software Sustainability: WSSSPE & URSSI
PDF
Software citation
PDF
Expressing and sharing workflows
PDF
Citation and reproducibility in software
PPTX
Software Citation: Principles, Implementation, and Impact
PPTX
Summary of WSSSPE and its working groups
PPTX
Working towards Sustainable Software for Science: Practice and Experience (WS...
PPTX
20160607 citation4software panel
PPTX
20160607 citation4software opening
PPTX
Scientific Software Challenges and Community Responses
PPTX
What do we need beyond a DOI?
Research software susainability
Software Professionals (RSEs) at NCSA
Parsl: Pervasive Parallel Programming in Python
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
What is eScience, and where does it go from here?
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
Fundamentals of software sustainability
Software Citation in Theory and Practice
Research Software Sustainability: WSSSPE & URSSI
Software citation
Expressing and sharing workflows
Citation and reproducibility in software
Software Citation: Principles, Implementation, and Impact
Summary of WSSSPE and its working groups
Working towards Sustainable Software for Science: Practice and Experience (WS...
20160607 citation4software panel
20160607 citation4software opening
Scientific Software Challenges and Community Responses
What do we need beyond a DOI?

Recently uploaded (20)

PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Approach and Philosophy of On baking technology
PDF
Spectral efficient network and resource selection model in 5G networks
DOCX
The AUB Centre for AI in Media Proposal.docx
 
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
 
PDF
cuic standard and advanced reporting.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
PPT
Teaching material agriculture food technology
GamePlan Trading System Review: Professional Trader's Honest Take
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
NewMind AI Weekly Chronicles - August'25 Week I
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Network Security Unit 5.pdf for BCA BBA.
Approach and Philosophy of On baking technology
Spectral efficient network and resource selection model in 5G networks
The AUB Centre for AI in Media Proposal.docx
 
CIFDAQ's Market Insight: SEC Turns Pro Crypto
 
cuic standard and advanced reporting.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
 
Teaching material agriculture food technology

Citation and Research Objects: Toward Active Research Objects

  • 1. Citation and Research Objects: Toward Active Research Objects Research Objects 2019, eScience 2019, 24 September 2019 Daniel S. Katz (d.katz@ieee.org, http://danielskatz.org, @danielskatz) Assistant Director for Scientific Software & Applications, NCSA Research Associate Professor, CS, ECE, iSchool https://doi.org/10.5281/zenodo.3338176
  • 2. Definitions • Simple research object • Small unit of citable work • E.g., paper, dataset, version of software, etc. • Complex research object • Collection of multiple simple research object • E.g., Research Objects as thought of in this workshop
  • 3. Citing simple research objects • Recent progress in creating principles for citing simple research objects, such as data [1] and software [2] • Different principles because these are fundamentally different objects [3] • More recently, community efforts to implement these citation principles • Data: FORCE11 Data Citation Implementation Group • Software: FORCE11 Software Citation Implementation Working Group • Data (in the context of the FAIR data principles): Enabling FAIR Data [4] • Note: There’s no widely-accepted equivalent of FAIR data principles for software or for other research objects, though some researchers are working in this area [1] Joint declaration of data citation principles https://doi.org/10.25490/a97f-egyk [2] Software citation principles https://doi.org/10.7717/peerj-cs.86 [3] Software vs. data in the context of citation https://doi.org/10.7287/peerj.preprints.2630v1 [4] The FAIR guiding principles for scientific data management and stewardship https://doi.org/10.1038/sdata.2016.18
  • 4. How to cite simple research objects • Follow the example of long-established method for citing papers: 1. Deposit item (data, software) and associated metadata in an archival repository • Possible peer-review or repository checks 2. The repository (aka publisher) stores/archives the item and metadata; provides an identifier that can be used to retrieve them 3. Identifier and metadata are used to cite the object
  • 5. Citing complex research objects? • Complex research objects are objects that contain other objects, e.g., “Research Objects” • What could be cited? • Entire complex object (as a single entity) • Some of the contained objects (which may already have identifiers) • Both • How to cite? • Two proposals follow
  • 6. Basic citation of complex research objects • Proposal 1: Treat complex research object as a container and a set of contents & cite both complex research object and all the contained objects that were used • FORCE11 Software Citation Implementation Working Group recently defined some challenges [5] • One is how to cite complex software objects, namely frameworks that include components • A framework can have lots of components • Only some components are used in a particular research project • So a set of citations for that project should cite the framework and the components that were used • Citation of Research Objects (ROs) [6] is similar • RO itself should be cited, plus objects in RO that are used, not those that are not • Citing objects in the RO can then be handled similarly to how those objects outside an RO would be cited, whether they are data, software, or something else • Note: this relies on separability of objects; not the case for some complex research objects, e.g., Jupyter Notebooks, where all the software, data, and text are bundled in such a way that they cannot be separated and individually cited [5] Software citation implementation challenges http://arxiv.org/abs/1905.08674 [6] Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
  • 7. How to cite complex research objects • The necessary steps are thus: 1. Tracking what parts of the RO was used (both the RO itself and the objects within it) 2. Finding identifiers & other metadata for the RO and its objects that were used 3. Building correctly formatted citations for the RO and its objects that were used • Step 1 is the greatest challenge • With current Research Objects, this must be done outside the RO, either manually or by tools that use the RO (e.g., an electronic notebook system) • For Steps 2 and 3 • Cite the RO as a data object; follow data citation principles • Cite software, data, and documentation objects in an RO as you would for any software or data objects or papers • Contents may have identifiers already based on their existence outside the RO, or they can be given identifiers when the RO is given an identifier, with suitable relationship metadata between the RO and the content
  • 8. Active research objects and citation • Move beyond current Research Objects to automatically track usage of object inside ROs • As stated on http://www.researchobject.org: “Enriching these resources and collections with any and all additional information required to make research reusable, and reproducible!” • Proposal 2: Active Research Objects (AROs), adds internal data and methods to the RO • Basic ARO methods: put() and get() to place and access the object within the ARO • put() requires data beyond the object being placed • Data currently required by many ROs, including description, checksum, etc. • External identifier (DOI) and a citation • Perhaps also internal identifier (e.g., IDO [7]) • get() tracks when an object is accessed • ARO data includes: flags for each internal object • Initially set to false when object is put • Set to true when then object is accessed via get() • Next ARO method: validate() method to provide fixity • Final ARO method: citation(), similar to the citation method in R [8], except can be used to obtain citation for whole RO, citations for RO and internal objects that have been used, or citation for one specific internal object [7] Identifiers for Digital Objects: the Case of Software Source Code Preservation https://hal.archives-ouvertes.fr/hal-01865790 [8] Citing R https://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R
  • 9. Acknowledgements • Prior support from NIH Data Commons Pilot Program Consortium (DCPPC) via Harvard as part of Team Sodium • Thanks! • Questions?