SlideShare a Scribd company logo
| 1
Anita de Waard, VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
December 19, 2016
Elsevier‘s RDM Program:
Ten Habits of Highly Effective Data
| 2
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
10.Integrateupstreamanddownstream
–makemetadatatoserveuse.
Save
Share
Use
9. Re-usable (allow tools to run on it)
8. Reproducible
7. Trusted (e.g. reviewed)
6. Comprehensible (description / method is available)
5. Citable
4. Discoverable (data is indexed or data is linked from article)
3. Accessible
1. Stored (existing in some form)
2. Preserved (long-term & format-independent)
A Maslow Hierarchy for Research Data:
| 3
Store, Preserve: Data Rescue Award
| 4
Store: Hivebench
www.hivebench.com
| 5
https://data.mendeley.com/
Linked to published
papers – or not
Linked to Github
– or not
Versioning and
provenance tracking
Store, Access: Mendeley Data
Different Licenses:
GNU-PL, CC-BY CC0,
etc
| 6
Access, Cite: Data Linking
• Integrated in paper submission process
• Supplementary data is never behind a firewall
• Closely integrated with > 150 databases:
| 7
Access, Discover: Scholix/DLIs
• ICSU-WDS/RDA Publishing Data Service Working group,
merged with National Data Service pilot
• Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe
PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others
• Proposed long-term architecture and interoperability framework: www.scholix.org
• Operational prototype at http://dliservice.research-infrastructures.eu/#/api
(including 1.4 Million links from various sources)
| 8
Cite: Force11
https://www.elsevier.com/connect/data-citation-is-becoming-real-with-force11-and-elsevier
| 9
Discover: Datasearch
https://datasearch.elsevier.com
| 10
Data
articles
Software
articles
Method
articles
Protocols
Video
articles
Hardware
articles
Lab
resources
Full Research
paper
• Brief article types designed to
communicate a specific element of
the research cycle
• Complementary to full research
papers
• Easy to prepare and submit
• Peer-reviewed and indexed
• Receive a DOI and fully citable
• Allow citable post-publication
updates
• Primarily Open Access (CC-BY)
• Published in Multidisciplinary and
domain-specific journals
https://www.elsevier.com/books-and-journals/research-elements
Review: Research Elements
| 11
• Cortex Registered Reports:
• Method and proposed analysis are submitted for pre-registration
• Paper is conditionally accepted
• Research is executed
• Full paper submitted, accepted provided that protocol is followed
• Reproducibility Papers:
• Describes all the software and data used to derive the published results, as
well as provides instructions on how to reproduce and validate such results.
• Using Mendeley Data, authors also submit their code, data, and optionally a
ReproZip package or a Docker container to make the review process easier.
• Reviewers not only review the reproducibility paper, but also validate the
results and claims published in the original manuscript.
• Once the paper is accepted, (non-blind) reviewers also become co-authors
and are encouraged to add a section in the paper that states the extent to
which the software is portable, is robust to changes, and is likely to be usable.
Reproduce: Some Journal Efforts:
| 12
Research
article
published
Initial inquiry
Share,
publish and
link data
Monitor
progress and
provide
guidance
Generate
reports
111110 00011
1101110 0000
001
10011
1
011100
101
What?
• Service for Research Institutes (esp. librarians) to
engage with researchers throughout the research
data life cycle.
How?
Offer service for Librarians to interact with researchers
regarding the RDM Process to:
• Offer solutions to store, share, link and publish data
• Monitor progress report on posting, citation,
downloads of dataset
• Provide monthly reportingDATA
LIGHTHOUSE
Metrics for Institutions: Data Lighthouse
| 13
10.Integrateupstreamanddownstream
–makemetadatatoserveuse.
Save
Share
Use
9. Re-usable
8. Reproducible
7. Trusted
6. Comprehensible
5. Citable
4. Discoverable
3. Accessible
1. Stored
2. Preserved
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
Data at Risk
Reproducibility Initiative
Data
Lighthouse
In summary:
Elsevier Efforts Collaborative Efforts
| 14
“Now show me how all of this works
together… on one of my papers!”
• Phil Bourne, August 2016
See Demo
| 15
A Tale of (Ir)reproducibility
There once was a computational biology paper…
Kinney et al. 2010, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000976
| 16
A Tale of (Ir)eproducibility
... that couldn’t be (easily) reproduced.
| 17
A Tale of (Ir)eproducibility
Some brave souls did reproduce it …
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne,
Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the
Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
| 18
A Tale of (Ir)eproducibility
… but it was a lot of work.
Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne,
Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the
Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
| 19
Some tools to improve this:
1. Store protocols in an Electronic Lab Notebook.
Keep collection
of protocols
online
Edit, export,
share
| 20
Some tools to improve this:
2. Run experiments from this Lab Notebook.
Edit, export,
share
Base on saved
Protocols
Save and
Export Outputs
| 21
Some tools to improve this:
3. Export results to a trusted data repository.
Describe how
exoeriment can
be reproduced
Keep track of
versions of
dataset
Create DOI for
Citation
Link back to
protocols
Store up to 5
GB of data in
many formats
| 22
Some tools to improve this:
4. Publish in a data journal & link back.
Journal focuses
on Method
reporiduction
Link to protocols
Link to Data
Fully OA
| 23
The Moral of this Story:
• How are we improving the ‘old way of working’?
- Methods and data can be stored by researchers directly during the
experiment, so the 270 hours of reproduction > 0 (given that the protocol
is stored for reuse during the experiment)
- Better reproducibility because tools and methods are stored innately, no
need to recap, rebuild, and recover
- More accurate workflow representation because progress is tracked
while it happens, not just afterwards
• Are we there yet?
- We’re getting somewhere: “Your tools […], layer a UI on top of a whole
set of disjointed components; this is ultimately what people want!”
Phil Bourne, ADDS NIH
- But we’re not quite there:
o Need to run code from the tools, planned
o Even easier exporting/publishing workflows planned
o Integration with other tools: ELNs, (institutional) repositories, journals, sharing
platforms planned.
| 24
A development partnership proposal:
1. You try out our tools:
- Institutional install for Hivebench
- Installation of Mendeley Data (in the cloud, later on local service)
- If interested: Data Lighthouse pilot
2. In return, you help us explore what these tools will look like:
- Connect to Pis/Postdocs/Grad Students who are interested in trying out
Hivebench/Mendeley Data
- We ask them for feedback on the tools, help with any issues
- You explore Data Lighthouse, tell us what you would like to see in terms of
reporting/emails etc.
3. Timeframe:
- Start by signing an MoU (no money changes hands; we provide
services/software/support, you help connect us to researchers, provide
feedback)
- We evaluate collaboration after 6 months, see if anything needs to change
- Tools are free for 24 months, no other obligations.
| 25
Hivebench Features:
Fully-fledged electronic online notebook.
Allows researchers to manage:
• Experiments,
• Protocols,
• Reagents,
• Research Data (integrated with Mendeley Data, or not).
Collaborative and confidential:
• Researchers can keep results private, or collaborate with group, or world to publish
protocols
• Secure location in the cloud
Institutional edition (planned):
• Hivebench installed locally, on institutional server in secure offline environment
• Log-in with institutional credentials
• Tracking and reporting of metrics at group/individual level
| 26
Mendeley Data Features (today and tomorrow)
Trusted Data Repository
• Publish data under embargo: full control of visibility of datasets before and after publication
• Once published, DOI is assigned
• Published datasets stored (and accessible) in perpetuity in the DANS archive
• Data Seal of Approval certification
Flexible and Easy to Use
• Simple and intuitive user interface (a la Drop Box, Google Docs)
• Version management for longitudinal studies: new DOI for each version, enable version citation
• Customised metadata schemas for each research project
• Upload data directly from university file systems, other electronic lab notebooks, Dropbox etc.
• Automatic tagging of datasets with keywords using Elsevier Fingerprint Engine
Integrated into Research Ecosystem
• Integrated with Mendeley reference manager and social network used by over 3 million researchers
• Integrated with Github, versioning can be updated with software version
• Integrated with Hivebench ELN for end to end research lifecycle management
• Integrated with Elsevier publishing platform (Evise) used by over 1,000 scientific journals
• Link datasets with other research outputs (articles, datasets, software etc.) to increase findability and re-
use
• Files can be stored in the cloud
| 27
Mendeley Data Institutional Features (mostly tomorrow)
Customized for Institutions:
• Seamless integration with Pure to link research data to people, departments, publications and
projects
• Customised workflows that fit the way each research project team works and the rules of your
institution
• Files can be stored on institutional network file system
• Provide DOI minting using institutional prefix
• Showcase research datasets externally on a web page with institutional branding
• Provide single sign-on for researchers using existing institutional credentials
Reporting and Analysis Tools:
• Reporting on impact of datasets including views, downloads and citations
• Reporting on compliance with funder data mandates by Grant ID
• Reporting on storage space used by person, project and department to ensure operation
within assigned quotas
| 28
Data Lighthouse pilot, some questions:
General Research Data Management questions:
1. How does RDM work in your institution?
2. What role do libraries, research office, researchers play, respectively?
3. Do you have the institutional data policy?
4. Which departments are the higher/lower adopters?
5. What are the RDM tools available for your researchers? How well are they used?
6. Are you aware of negative/positive factors that may influence adoption rates?
Engagement questions:
1. How do you currently engage with researchers in the RDM space?
2. What additional services do you need?
3. Does the Data Lighthouse project resonate with your needs?
4. Are there any use cases/scenarios and metrics that we haven’t thought of?
5. Can we work together to improve adoption rates of RDM tools by your researchers?
6. Where would information re RDM processes come from, what format should it have?
Pilot questions: would you be interested in e.g.:
1. Organizing a joint workshop between Research Data Management key personnel of your
institution and the Elsevier RDM team to refine the current Data Lighthouse project scope and
requirements?
2. Running a test emailing campaign within 1-2 departments/labs followed by phone interviews with
a few librarians and active researchers?
| 29
Support for Research Data Management
with Data Lighthouse (mockups)
Datasets
shared
Datasets
linked
Datasets
curated
Data articles
submitted
Data articles
published
Datasets
viewed
Datasets
cited
Data Lighthouse
Dashboard
Data Lighthouse Dashboard
| 30
Links:
• RDM Projects:
• https://www.hivebench.com
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue-
award-in-the-geosciences
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://rd-alliance.org/bof-data-search.html
• https://data.mendeley.com/
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
• https://www.elsevier.com/about/open-science/research-data
• Bourne Demo: Original Materials:
- The original research paper: Kinnings et al, 2010
- The paper describing the earlier reproducibility effort: Garijo et al., 2013
- A wiki with the reproduction attempt: Gil/Darijo, 2012
- Background materials on the reproduction efforts: Garijo, 2012
- SMAP Tool: Xie, 2010
- Protocol in Hivebench: https://www.hivebench.com/protocols/16483
- Experiment in Hivebench: https://www.hivebench.com/notebooks/8524/experiments/20562
- Data in Mendeley Data: https://data.mendeley.com/datasets/r69mvkckmn/draft?preview=1
- MethodsX Paper, with links to protocols and data:
http://www.articleofthefuture.com/methodsx.html

More Related Content

PPTX
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
PPTX
The Economics of Data Sharing
PPTX
THOR Workshop - Introduction
PDF
THOR Workshop - Services PANGAEA
PPTX
THOR Workshop - Data Publishing PLOS
PPTX
THOR Workshop - Data Publishing Elsevier
PPTX
Dataverse for Journals
PDF
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
The Economics of Data Sharing
THOR Workshop - Introduction
THOR Workshop - Services PANGAEA
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing Elsevier
Dataverse for Journals
DataShare - Pauline Ward to University of Edinburgh School of Chemistry - 3 f...

What's hot (20)

PPTX
Implementing Archivematica, research data network
PPTX
Publishing the Full Research Data Lifecycle
PDF
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
PPTX
Collaboratively creating a network of ideas, data and software
PPTX
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
PDF
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
PPTX
Networked Science, And Integrating with Dataverse
PPTX
RDA-WDS Publishing Data Interest Group
PPTX
Why would a publisher care about open data?
PPTX
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
PDF
Reference Rot and E-Theses: Threat and Remedy
PPTX
THOR Workshop - Persistent Identifier Linking
PPTX
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
PDF
Data Repositories: Recommendation, Certification and Models for Cost Recovery
PDF
Data Publishing Models by Sünje Dallmeier-Tiessen
PDF
BioSharing - Update - Feb2016
PDF
NIH BD2K DataMed metadata model - Force11, 2016
PPTX
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PPTX
Making your data good enough for sharing.
Implementing Archivematica, research data network
Publishing the Full Research Data Lifecycle
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
Collaboratively creating a network of ideas, data and software
Preservation of Research Data: Dataverse / Archivematica Integration by Allan...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
Networked Science, And Integrating with Dataverse
RDA-WDS Publishing Data Interest Group
Why would a publisher care about open data?
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Reference Rot and E-Theses: Threat and Remedy
THOR Workshop - Persistent Identifier Linking
UCSF Informatics Day 2014 - Jocel Dumlao, "REDCap / MyResearch"
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Publishing Models by Sünje Dallmeier-Tiessen
BioSharing - Update - Feb2016
NIH BD2K DataMed metadata model - Force11, 2016
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
Dataverse in the Universe of Data by Christine L. Borgman
Making your data good enough for sharing.
Ad

Similar to Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum (20)

PPTX
Introduction to FAIRDOM
PDF
Effective research data management
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
PPTX
Reproducible research: theory
PDF
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
PDF
FAIR BioData Management
PPTX
FAIRDOM data management support for ERACoBioTech Proposals
PPTX
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PPTX
ERA CoBioTech Data Management Webinar
PPTX
Whitehead Seminar 5/2
PPTX
Talk on Research Data Management
PDF
Model management for systems biology projects
PPTX
Research methods group accelarating impact by sharing data
PDF
NFDI Physical Sciences Colloquium - FAIR
PPTX
Data Publishing Workflows with Dataverse
PPTX
Data-intensive applications on cloud computing resources: Applications in lif...
PPTX
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
PPTX
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
PPTX
Research data management: DMP & repository
Introduction to FAIRDOM
Effective research data management
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Reproducible research: theory
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
FAIR BioData Management
FAIRDOM data management support for ERACoBioTech Proposals
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
ERA CoBioTech Data Management Webinar
Whitehead Seminar 5/2
Talk on Research Data Management
Model management for systems biology projects
Research methods group accelarating impact by sharing data
NFDI Physical Sciences Colloquium - FAIR
Data Publishing Workflows with Dataverse
Data-intensive applications on cloud computing resources: Applications in lif...
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
Research data management: DMP & repository
Ad

More from Anita de Waard (19)

PDF
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
PDF
NFAIS Talk on Enabling FAIR Data
PPTX
CNI 2018: A Research Object Authoring Tool for the Data Commons
PPTX
Enabling FAIR Data: TAG B Authoring Guidelines
PPTX
Scientific facts are myths, told through fairytales and spread by gossip.
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PPTX
History of the future
PPTX
Big Data and the Future of Publishing
PPTX
Public Identifiers in Scholarly Publishing
PPTX
Charleston Conference 2016
PPTX
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
PPTX
The Rocky Road to Reuse
PPTX
Argumentation in biology papers
PPTX
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
PPTX
Ten Habits of Highly Effective Data
PPTX
Ten Habits of Highly Successful Data
PPTX
How to persuade with data
PPTX
Ten habits of highly effective data
PPTX
The habits of highly successful data:
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
NFAIS Talk on Enabling FAIR Data
CNI 2018: A Research Object Authoring Tool for the Data Commons
Enabling FAIR Data: TAG B Authoring Guidelines
Scientific facts are myths, told through fairytales and spread by gossip.
Data, Data Everywhere: What's A Publisher to Do?
History of the future
Big Data and the Future of Publishing
Public Identifiers in Scholarly Publishing
Charleston Conference 2016
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Rocky Road to Reuse
Argumentation in biology papers
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Ten Habits of Highly Effective Data
Ten Habits of Highly Successful Data
How to persuade with data
Ten habits of highly effective data
The habits of highly successful data:

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
An interstellar mission to test astrophysical black holes
PDF
Sciences of Europe No 170 (2025)
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
Chemical bonding and molecular structure
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
An interstellar mission to test astrophysical black holes
Sciences of Europe No 170 (2025)
TOTAL hIP ARTHROPLASTY Presentation.pptx
Phytochemical Investigation of Miliusa longipes.pdf
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
INTRODUCTION TO EVS | Concept of sustainability
Cell Membrane: Structure, Composition & Functions
Chemical bonding and molecular structure
The scientific heritage No 166 (166) (2025)
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
ECG_Course_Presentation د.محمد صقران ppt
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
. Radiology Case Scenariosssssssssssssss
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS

Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum

  • 1. | 1 Anita de Waard, VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com December 19, 2016 Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
  • 2. | 2 https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data 10.Integrateupstreamanddownstream –makemetadatatoserveuse. Save Share Use 9. Re-usable (allow tools to run on it) 8. Reproducible 7. Trusted (e.g. reviewed) 6. Comprehensible (description / method is available) 5. Citable 4. Discoverable (data is indexed or data is linked from article) 3. Accessible 1. Stored (existing in some form) 2. Preserved (long-term & format-independent) A Maslow Hierarchy for Research Data:
  • 3. | 3 Store, Preserve: Data Rescue Award
  • 5. | 5 https://data.mendeley.com/ Linked to published papers – or not Linked to Github – or not Versioning and provenance tracking Store, Access: Mendeley Data Different Licenses: GNU-PL, CC-BY CC0, etc
  • 6. | 6 Access, Cite: Data Linking • Integrated in paper submission process • Supplementary data is never behind a firewall • Closely integrated with > 150 databases:
  • 7. | 7 Access, Discover: Scholix/DLIs • ICSU-WDS/RDA Publishing Data Service Working group, merged with National Data Service pilot • Cross-stakeholder – with input from CrossRef, DataCite, OpenAIRE, Europe PubMed Central, ANDS, PANGAEA, Thomson Reuters, Elsevier, and others • Proposed long-term architecture and interoperability framework: www.scholix.org • Operational prototype at http://dliservice.research-infrastructures.eu/#/api (including 1.4 Million links from various sources)
  • 10. | 10 Data articles Software articles Method articles Protocols Video articles Hardware articles Lab resources Full Research paper • Brief article types designed to communicate a specific element of the research cycle • Complementary to full research papers • Easy to prepare and submit • Peer-reviewed and indexed • Receive a DOI and fully citable • Allow citable post-publication updates • Primarily Open Access (CC-BY) • Published in Multidisciplinary and domain-specific journals https://www.elsevier.com/books-and-journals/research-elements Review: Research Elements
  • 11. | 11 • Cortex Registered Reports: • Method and proposed analysis are submitted for pre-registration • Paper is conditionally accepted • Research is executed • Full paper submitted, accepted provided that protocol is followed • Reproducibility Papers: • Describes all the software and data used to derive the published results, as well as provides instructions on how to reproduce and validate such results. • Using Mendeley Data, authors also submit their code, data, and optionally a ReproZip package or a Docker container to make the review process easier. • Reviewers not only review the reproducibility paper, but also validate the results and claims published in the original manuscript. • Once the paper is accepted, (non-blind) reviewers also become co-authors and are encouraged to add a section in the paper that states the extent to which the software is portable, is robust to changes, and is likely to be usable. Reproduce: Some Journal Efforts:
  • 12. | 12 Research article published Initial inquiry Share, publish and link data Monitor progress and provide guidance Generate reports 111110 00011 1101110 0000 001 10011 1 011100 101 What? • Service for Research Institutes (esp. librarians) to engage with researchers throughout the research data life cycle. How? Offer service for Librarians to interact with researchers regarding the RDM Process to: • Offer solutions to store, share, link and publish data • Monitor progress report on posting, citation, downloads of dataset • Provide monthly reportingDATA LIGHTHOUSE Metrics for Institutions: Data Lighthouse
  • 13. | 13 10.Integrateupstreamanddownstream –makemetadatatoserveuse. Save Share Use 9. Re-usable 8. Reproducible 7. Trusted 6. Comprehensible 5. Citable 4. Discoverable 3. Accessible 1. Stored 2. Preserved https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data Data at Risk Reproducibility Initiative Data Lighthouse In summary: Elsevier Efforts Collaborative Efforts
  • 14. | 14 “Now show me how all of this works together… on one of my papers!” • Phil Bourne, August 2016 See Demo
  • 15. | 15 A Tale of (Ir)reproducibility There once was a computational biology paper… Kinney et al. 2010, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000976
  • 16. | 16 A Tale of (Ir)eproducibility ... that couldn’t be (easily) reproduced.
  • 17. | 17 A Tale of (Ir)eproducibility Some brave souls did reproduce it … Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
  • 18. | 18 A Tale of (Ir)eproducibility … but it was a lot of work. Daniel Garijo, Sarah Kinnings, Li Xie, Lei Xie, Yinliang Zhang, Philip E. Bourne, Yolanda Gil (2013). Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome, http://dx.doi.org/10.1371/journal.pone.0080278
  • 19. | 19 Some tools to improve this: 1. Store protocols in an Electronic Lab Notebook. Keep collection of protocols online Edit, export, share
  • 20. | 20 Some tools to improve this: 2. Run experiments from this Lab Notebook. Edit, export, share Base on saved Protocols Save and Export Outputs
  • 21. | 21 Some tools to improve this: 3. Export results to a trusted data repository. Describe how exoeriment can be reproduced Keep track of versions of dataset Create DOI for Citation Link back to protocols Store up to 5 GB of data in many formats
  • 22. | 22 Some tools to improve this: 4. Publish in a data journal & link back. Journal focuses on Method reporiduction Link to protocols Link to Data Fully OA
  • 23. | 23 The Moral of this Story: • How are we improving the ‘old way of working’? - Methods and data can be stored by researchers directly during the experiment, so the 270 hours of reproduction > 0 (given that the protocol is stored for reuse during the experiment) - Better reproducibility because tools and methods are stored innately, no need to recap, rebuild, and recover - More accurate workflow representation because progress is tracked while it happens, not just afterwards • Are we there yet? - We’re getting somewhere: “Your tools […], layer a UI on top of a whole set of disjointed components; this is ultimately what people want!” Phil Bourne, ADDS NIH - But we’re not quite there: o Need to run code from the tools, planned o Even easier exporting/publishing workflows planned o Integration with other tools: ELNs, (institutional) repositories, journals, sharing platforms planned.
  • 24. | 24 A development partnership proposal: 1. You try out our tools: - Institutional install for Hivebench - Installation of Mendeley Data (in the cloud, later on local service) - If interested: Data Lighthouse pilot 2. In return, you help us explore what these tools will look like: - Connect to Pis/Postdocs/Grad Students who are interested in trying out Hivebench/Mendeley Data - We ask them for feedback on the tools, help with any issues - You explore Data Lighthouse, tell us what you would like to see in terms of reporting/emails etc. 3. Timeframe: - Start by signing an MoU (no money changes hands; we provide services/software/support, you help connect us to researchers, provide feedback) - We evaluate collaboration after 6 months, see if anything needs to change - Tools are free for 24 months, no other obligations.
  • 25. | 25 Hivebench Features: Fully-fledged electronic online notebook. Allows researchers to manage: • Experiments, • Protocols, • Reagents, • Research Data (integrated with Mendeley Data, or not). Collaborative and confidential: • Researchers can keep results private, or collaborate with group, or world to publish protocols • Secure location in the cloud Institutional edition (planned): • Hivebench installed locally, on institutional server in secure offline environment • Log-in with institutional credentials • Tracking and reporting of metrics at group/individual level
  • 26. | 26 Mendeley Data Features (today and tomorrow) Trusted Data Repository • Publish data under embargo: full control of visibility of datasets before and after publication • Once published, DOI is assigned • Published datasets stored (and accessible) in perpetuity in the DANS archive • Data Seal of Approval certification Flexible and Easy to Use • Simple and intuitive user interface (a la Drop Box, Google Docs) • Version management for longitudinal studies: new DOI for each version, enable version citation • Customised metadata schemas for each research project • Upload data directly from university file systems, other electronic lab notebooks, Dropbox etc. • Automatic tagging of datasets with keywords using Elsevier Fingerprint Engine Integrated into Research Ecosystem • Integrated with Mendeley reference manager and social network used by over 3 million researchers • Integrated with Github, versioning can be updated with software version • Integrated with Hivebench ELN for end to end research lifecycle management • Integrated with Elsevier publishing platform (Evise) used by over 1,000 scientific journals • Link datasets with other research outputs (articles, datasets, software etc.) to increase findability and re- use • Files can be stored in the cloud
  • 27. | 27 Mendeley Data Institutional Features (mostly tomorrow) Customized for Institutions: • Seamless integration with Pure to link research data to people, departments, publications and projects • Customised workflows that fit the way each research project team works and the rules of your institution • Files can be stored on institutional network file system • Provide DOI minting using institutional prefix • Showcase research datasets externally on a web page with institutional branding • Provide single sign-on for researchers using existing institutional credentials Reporting and Analysis Tools: • Reporting on impact of datasets including views, downloads and citations • Reporting on compliance with funder data mandates by Grant ID • Reporting on storage space used by person, project and department to ensure operation within assigned quotas
  • 28. | 28 Data Lighthouse pilot, some questions: General Research Data Management questions: 1. How does RDM work in your institution? 2. What role do libraries, research office, researchers play, respectively? 3. Do you have the institutional data policy? 4. Which departments are the higher/lower adopters? 5. What are the RDM tools available for your researchers? How well are they used? 6. Are you aware of negative/positive factors that may influence adoption rates? Engagement questions: 1. How do you currently engage with researchers in the RDM space? 2. What additional services do you need? 3. Does the Data Lighthouse project resonate with your needs? 4. Are there any use cases/scenarios and metrics that we haven’t thought of? 5. Can we work together to improve adoption rates of RDM tools by your researchers? 6. Where would information re RDM processes come from, what format should it have? Pilot questions: would you be interested in e.g.: 1. Organizing a joint workshop between Research Data Management key personnel of your institution and the Elsevier RDM team to refine the current Data Lighthouse project scope and requirements? 2. Running a test emailing campaign within 1-2 departments/labs followed by phone interviews with a few librarians and active researchers?
  • 29. | 29 Support for Research Data Management with Data Lighthouse (mockups) Datasets shared Datasets linked Datasets curated Data articles submitted Data articles published Datasets viewed Datasets cited Data Lighthouse Dashboard Data Lighthouse Dashboard
  • 30. | 30 Links: • RDM Projects: • https://www.hivebench.com • https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-2015-international-data-rescue- award-in-the-geosciences • http://www.journals.elsevier.com/softwarex/ • https://www.elsevier.com/books-and-journals/content-innovation/data-base-linking • https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://rd-alliance.org/bof-data-search.html • https://data.mendeley.com/ • https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data • https://www.force11.org/ • http://www.nationaldataservice.org/ • https://rd-alliance.org/ • https://www.elsevier.com/about/open-science/research-data • Bourne Demo: Original Materials: - The original research paper: Kinnings et al, 2010 - The paper describing the earlier reproducibility effort: Garijo et al., 2013 - A wiki with the reproduction attempt: Gil/Darijo, 2012 - Background materials on the reproduction efforts: Garijo, 2012 - SMAP Tool: Xie, 2010 - Protocol in Hivebench: https://www.hivebench.com/protocols/16483 - Experiment in Hivebench: https://www.hivebench.com/notebooks/8524/experiments/20562 - Data in Mendeley Data: https://data.mendeley.com/datasets/r69mvkckmn/draft?preview=1 - MethodsX Paper, with links to protocols and data: http://www.articleofthefuture.com/methodsx.html

Editor's Notes

  • #7: IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.
  • #8: IUPAC has recommendations for what word you should use to describe a given property, but the vocabulary itself isn’t very accessible or usable itself, thus is not universally implemented. Each site decides how it wants to label a given property, which hinders indexing and reuse of the data across silos. Structured capture of information using an ELN such as Hivebench enables the researcher to report data using a consistent vocabulary without extra effort.
  • #10: Chemistry data are retrievable from NIST, but only by going to their page in a browser and using their search tools. What about accessible within other applications, or accessible in assistive devices for those with vision impairment? What guarantee do we have the data will remain accessible in case of government funding problems?