SlideShare a Scribd company logo
Show me the data!
Data peer review at Scientific Data
Varsha Khodiyar, Scientific Data
30.03.2017
1
Scientific Data, a Nature Research journal
Data Descriptor
Primary article type; sound
science and facilitates data
reuse
Analysis
New analyses or meta-
analyses of existing data
Article
Original reports on
advances in data sharing &
reuse
Comment
Announcements of broad
interest; usually invited
www.nature.com/scientificdata
2
Under the hood of a Data Descriptor
• Context for data generation
(background)
• How was data generated?
• How was data processed?
• Where is the data?
• Synthesis
• Analysis
• Conclusions
3
A key principle of publishing at Scientific Data
Wilkinson M.D., et al . The FAIR Guiding Principles for
scientific data management and stewardship.
Scientific Data 3; 160018 (2016)
doi:10.1038/sdata.2016.18
Findable – (meta)data is uniquely and
persistently identifiable.
Accessible – data is reachable and
accessible by humans and machines, using
standard formats and protocols.
Interoperable – (meta)data is machine
readable and annotated with resolvable
vocabularies and ontologies.
Reusable – (meta)data is sufficiently well-
described to allow integration with
compatible data.
4
Data Descriptors have human and machine
understandable components
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
representation of
study
i.e. article (HTML &
PDF)
5
Data Descriptors have human and machine
understandable components
Machine accessible
representation of
study
i.e. metadata
6
What types of data can be published?
6
Decades old
dataset
Standalone
dataset
Data that has been
used in an analysis
article
Large
consortium
dataset
Data from a
single
experiment
Any data that the researcher
finds valuable and that others
might find useful too
Data associated with a
high impact analysis
article
7
When can a Data Descriptor be published?
7
After data
analysis has been
published
Before analysis has
been published
Authors not
intending to
analyse data
Data Descriptors can be
submitted and published at
any point in the research
workflow, i.e. whenever it
makes most sense for your
data
After data
analysis has been
published
Before the analysis
has been
published
Publication alongside
analysis article
88
Why peer review data?
9
Researchers are sharing and reusing data
• Direct contact between researchers
(on request) is the most common
way of sharing data
• Repositories are second most
common method of sharing
Why might direct contact be the
most preferred method?
Fig 2A & C; Kratz and Strasser, PLOS ONE (2015)
doi: 10.1371/journal.pone.0117619
10
Researchers see peer review as a mark of data quality
• Respondents trust peer review above all else: 72% (n = 175) say peer review
confers high or complete confidence in the data
Figure 6B; Kratz and Strasser, PLOS ONE (2015) doi: 10.1371/journal.pone.0117619
1111
How is data peer reviewed at Scientific Data?
12
Editorial office
Susanna-Assunta Sansone
Honorary Academic Editor
Andrew L. Hufton
Managing Editor
Varsha K. Khodiyar
Data Curation Editor
13
Selection of Editorial Board members
Experts in their discipline
AND
Demonstrable experience of data standards, data reuse or data analysis in
their discipline
www.nature.com/sdata/about/editorial-board#eb
14
Data peer review
www.nature.com/sdata/policies/for-referees
Experimental
Rigor and
Technical Data
Quality
Were data produced in a sound manner?
Technical quality of data – appropriate statistical analyses?
Experimental rigor - appropriate depth, coverage?
Completeness
of the
Description
Sufficient detail to allow others to reproduce these steps?
Sufficient detail to allow others to reuse this data?
Consistent with relevant minimum reporting standards?
Integrity of the
Data Files and
Repository
Record
Do data files appear complete and match manuscript
descriptions?
Are data archived to the most appropriate repository?
15
We capture metadata about the dataset being described in each Data Descriptor.
During the metadata curation process
• Manuscript re-read
• Data archive checked
• Minor issues with the data and/or manuscript often identified
Metadata curation and final data checking
16
Why a Data Descriptor may be rejected
Reject without review
• Out of scope or no data present
Reject after review
• Serious flaws in the study design,
e.g. lack of crucial controls
• Serious issues identified in the data
files by the peer reviewers
After rejection
• Address concerns and resubmit to Scientific Data
• Resubmit to another data journal
• Withdraw data from Scientific Data integrated repositories
Data should be technically reliable and suitable for use by others
1717
Ensuring your data is peer review ready
18
Create a data management plan
• Can avoid problems later
• Increasingly required by funders
• Critically evaluate existing practices – you may be setting standards for
your field
• Some aspects of best practice may incur costs
• Find people and resources that can help you
Datasets CodeMetadataResearch paper
Nature Genetics
19
Archive your data to the most appropriate repository
We currently list around 90
repositories, across biological,
medical, physical and social sciences
www.nature.com/sdata/policies/repositories
Considerations:
1. Is there a discipline or data-specific repository for your data?
2. If no discipline or data-specific repository for your data exists, does your
funder or institution mandate deposition to a particular repository?
20
Spot the mistakes
Unhelpful
document name
Formatting used to
convey information
Special characters
can cause text
mining errors
Meaningless
column titles
Undefined
abbreviation No units are
given
21
Increasing intelligibility
Self-explanatory
document name
Removed cell
formatting
Removed special
characters
Meaningful
column titles
Defined
‘BUN’
22
Increasing assessability
Information which was
asterisked is now added to
results section
Added Units
column
23
Increasing re-usability
Additional information
to be added to methods
section or table legend
24
Increasing reproducibility
• Include any additional information needed to understand the data,
methods, parameters, e.g. which instrument (make and model) was
used to measure blood carbon dioxide levels?
• Include availability statements for any code that was used to view,
parse or analyse the data, in support of the conclusions.
25
Reporting Guidelines
2626
What happens when data is shared well?
27
Data reuse by other researchers in the same field
2
“The Data Descriptor made it easier to
use the data, for me it was critical that
everything was there…all the technical
details like voxel size.”
Professor Daniele Marinazzo
28
2
www.bbc.co.uk/news/science-environment-33057402
Data reuse by the non-research community
29
Data reuse by the non-research community
2
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
30
Data peer review at Scientific Data
Data Archive
• Checked multiple times
• Scientific reasoning underlying data reviewed by active researchers
• Technical validity reviewed by discipline experts
Data
Citations
• Citation accuracy confirmed by specialist editor
• Citation format checked by editorial team
• Data linkage tested by production team
Data Peer
Review
• Does not have to be onerous
• Can save overall reviewing time
• Results in data that is reusable and useful!
3131
Thank you!
Visit nature.com/scientificdata
Email scientificdata@nature.com
Tweet @ScientificData

More Related Content

PPTX
The challenge of sharing data well, how publishers can help
PPTX
Gaining credit for sharing research data: Viewpoints on Data Publishing
PDF
Peer Reviewing Data: experiences from a data journal
PPTX
Publishing and impact 20141028
PPTX
Talk on Research Data Management
PPTX
Wilson-npg-scientific data-nfdp13
PDF
Data sharing as part of the research workflow
PDF
Data sharing as part of the research ecosystem
The challenge of sharing data well, how publishers can help
Gaining credit for sharing research data: Viewpoints on Data Publishing
Peer Reviewing Data: experiences from a data journal
Publishing and impact 20141028
Talk on Research Data Management
Wilson-npg-scientific data-nfdp13
Data sharing as part of the research workflow
Data sharing as part of the research ecosystem

What's hot (20)

PDF
Gaining credit for sharing research data
PPTX
Workflows for Publishing Data; Scientific Data's experience as an early adopter
PDF
On community-standards, data curation and scholarly communication" Stanford M...
PPTX
Identifying and tracking research resources using RRIDs: a practical approach
PPTX
Research data management workshop april12 2016
PPTX
DataONE Education Module 01: Why Data Management?
PDF
Enhance your rese​arch impact through open science
PDF
On community-standards, data curation and scholarly communication - BITS, Ita...
PDF
Va sla nov 15 final
PPTX
Transparency and reproducibility in research
PDF
NIH BD2K DataMed metadata model - Force11, 2016
PDF
Open Science: Research Data Management
PDF
Data Management Lab: Session 2 slides
PPT
Landing Pages - Joe Hourcle - RDAP12
PPTX
DataONE Education Module 07: Metadata
PPTX
Summary of data citation synthesis activity & Review
PPTX
Publishing perspectives on data management & future directions
PPTX
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
PPTX
Payton Eliminating Conflicts in Ebook Metadata
PPTX
DataONE Education Module 03: Data Management Planning
Gaining credit for sharing research data
Workflows for Publishing Data; Scientific Data's experience as an early adopter
On community-standards, data curation and scholarly communication" Stanford M...
Identifying and tracking research resources using RRIDs: a practical approach
Research data management workshop april12 2016
DataONE Education Module 01: Why Data Management?
Enhance your rese​arch impact through open science
On community-standards, data curation and scholarly communication - BITS, Ita...
Va sla nov 15 final
Transparency and reproducibility in research
NIH BD2K DataMed metadata model - Force11, 2016
Open Science: Research Data Management
Data Management Lab: Session 2 slides
Landing Pages - Joe Hourcle - RDAP12
DataONE Education Module 07: Metadata
Summary of data citation synthesis activity & Review
Publishing perspectives on data management & future directions
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Payton Eliminating Conflicts in Ebook Metadata
DataONE Education Module 03: Data Management Planning
Ad

Similar to Data peer review workshop (20)

PDF
Preparing your data for sharing and publishing
PDF
Scientific Data and peer review session at Dryad event, May 2015
PPTX
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
PDF
INSERM - Data Management & Reuse of Health Data - May 2017
PDF
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
PPTX
Recognising data sharing
PPTX
Research data management workshop April 2016
PDF
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
PPTX
Research data life cycle
PPTX
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
PPTX
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
PDF
Effective research data management
PPTX
FAIR Data Knowledge Graphs
PDF
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
PPTX
Introduction to research data management
PDF
FAIR Data Knowledge Graphs–from Theory to Practice
PPTX
Data, Data Everywhere: What's A Publisher to Do?
PDF
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
PPTX
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
PDF
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
Preparing your data for sharing and publishing
Scientific Data and peer review session at Dryad event, May 2015
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
INSERM - Data Management & Reuse of Health Data - May 2017
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Recognising data sharing
Research data management workshop April 2016
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Research data life cycle
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Effective research data management
FAIR Data Knowledge Graphs
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
Introduction to research data management
FAIR Data Knowledge Graphs–from Theory to Practice
Data, Data Everywhere: What's A Publisher to Do?
Application of Secondary Data in Epidemiological Study, Design Protocol and S...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
Ad

More from Varsha Khodiyar (20)

PDF
Digital transformation to enable a FAIR approach for health data science
PDF
Lessons from the UK: Data access, patient trust & real-world impact with heal...
PDF
COVID-19 variants, vaccines and tests
PDF
COVID-19 variants and vaccines
PDF
Data citation and sharing during article publication
PDF
The importance of research data repositories
PDF
What role can publishers play in the open data ecosystem?
PDF
Five essentials factors for unlocking the potential for Open Research Data
PPTX
New approaches to data management: supporting FAIR data sharing at Springer N...
PPTX
The value of data curation as part of the publishing process
PDF
Facilitating good research data management practice as part of scholarly publ...
PDF
Practical challenges for researchers in data sharing
PDF
Update from Data policy standardisation and implementation IG
PPTX
Data Publishing and Institutional Repositories
PPTX
Clinical Data Publishing at Scientific Data
PPTX
Privacy and Publication: challenges and opportunities for clinical data
PPTX
Why should researchers care about data curation?
PPTX
Share & Flourish workshop, Leiden, August 2014
PPTX
Open science: your questions answered
PPTX
Open for science to support replication
Digital transformation to enable a FAIR approach for health data science
Lessons from the UK: Data access, patient trust & real-world impact with heal...
COVID-19 variants, vaccines and tests
COVID-19 variants and vaccines
Data citation and sharing during article publication
The importance of research data repositories
What role can publishers play in the open data ecosystem?
Five essentials factors for unlocking the potential for Open Research Data
New approaches to data management: supporting FAIR data sharing at Springer N...
The value of data curation as part of the publishing process
Facilitating good research data management practice as part of scholarly publ...
Practical challenges for researchers in data sharing
Update from Data policy standardisation and implementation IG
Data Publishing and Institutional Repositories
Clinical Data Publishing at Scientific Data
Privacy and Publication: challenges and opportunities for clinical data
Why should researchers care about data curation?
Share & Flourish workshop, Leiden, August 2014
Open science: your questions answered
Open for science to support replication

Recently uploaded (20)

PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Sciences of Europe No 170 (2025)
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPT
protein biochemistry.ppt for university classes
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
7. General Toxicologyfor clinical phrmacy.pptx
Sciences of Europe No 170 (2025)
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
POSITIONING IN OPERATION THEATRE ROOM.ppt
Classification Systems_TAXONOMY_SCIENCE8.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
AlphaEarth Foundations and the Satellite Embedding dataset
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
ANEMIA WITH LEUKOPENIA MDS 07_25.pptx htggtftgt fredrctvg
neck nodes and dissection types and lymph nodes levels
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
Viruses (History, structure and composition, classification, Bacteriophage Re...
protein biochemistry.ppt for university classes
Phytochemical Investigation of Miliusa longipes.pdf
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
TOTAL hIP ARTHROPLASTY Presentation.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...

Data peer review workshop

  • 1. Show me the data! Data peer review at Scientific Data Varsha Khodiyar, Scientific Data 30.03.2017
  • 2. 1 Scientific Data, a Nature Research journal Data Descriptor Primary article type; sound science and facilitates data reuse Analysis New analyses or meta- analyses of existing data Article Original reports on advances in data sharing & reuse Comment Announcements of broad interest; usually invited www.nature.com/scientificdata
  • 3. 2 Under the hood of a Data Descriptor • Context for data generation (background) • How was data generated? • How was data processed? • Where is the data? • Synthesis • Analysis • Conclusions
  • 4. 3 A key principle of publishing at Scientific Data Wilkinson M.D., et al . The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3; 160018 (2016) doi:10.1038/sdata.2016.18 Findable – (meta)data is uniquely and persistently identifiable. Accessible – data is reachable and accessible by humans and machines, using standard formats and protocols. Interoperable – (meta)data is machine readable and annotated with resolvable vocabularies and ontologies. Reusable – (meta)data is sufficiently well- described to allow integration with compatible data.
  • 5. 4 Data Descriptors have human and machine understandable components Human readable representation of study i.e. article (HTML & PDF) Human readable representation of study i.e. article (HTML & PDF)
  • 6. 5 Data Descriptors have human and machine understandable components Machine accessible representation of study i.e. metadata
  • 7. 6 What types of data can be published? 6 Decades old dataset Standalone dataset Data that has been used in an analysis article Large consortium dataset Data from a single experiment Any data that the researcher finds valuable and that others might find useful too Data associated with a high impact analysis article
  • 8. 7 When can a Data Descriptor be published? 7 After data analysis has been published Before analysis has been published Authors not intending to analyse data Data Descriptors can be submitted and published at any point in the research workflow, i.e. whenever it makes most sense for your data After data analysis has been published Before the analysis has been published Publication alongside analysis article
  • 10. 9 Researchers are sharing and reusing data • Direct contact between researchers (on request) is the most common way of sharing data • Repositories are second most common method of sharing Why might direct contact be the most preferred method? Fig 2A & C; Kratz and Strasser, PLOS ONE (2015) doi: 10.1371/journal.pone.0117619
  • 11. 10 Researchers see peer review as a mark of data quality • Respondents trust peer review above all else: 72% (n = 175) say peer review confers high or complete confidence in the data Figure 6B; Kratz and Strasser, PLOS ONE (2015) doi: 10.1371/journal.pone.0117619
  • 12. 1111 How is data peer reviewed at Scientific Data?
  • 13. 12 Editorial office Susanna-Assunta Sansone Honorary Academic Editor Andrew L. Hufton Managing Editor Varsha K. Khodiyar Data Curation Editor
  • 14. 13 Selection of Editorial Board members Experts in their discipline AND Demonstrable experience of data standards, data reuse or data analysis in their discipline www.nature.com/sdata/about/editorial-board#eb
  • 15. 14 Data peer review www.nature.com/sdata/policies/for-referees Experimental Rigor and Technical Data Quality Were data produced in a sound manner? Technical quality of data – appropriate statistical analyses? Experimental rigor - appropriate depth, coverage? Completeness of the Description Sufficient detail to allow others to reproduce these steps? Sufficient detail to allow others to reuse this data? Consistent with relevant minimum reporting standards? Integrity of the Data Files and Repository Record Do data files appear complete and match manuscript descriptions? Are data archived to the most appropriate repository?
  • 16. 15 We capture metadata about the dataset being described in each Data Descriptor. During the metadata curation process • Manuscript re-read • Data archive checked • Minor issues with the data and/or manuscript often identified Metadata curation and final data checking
  • 17. 16 Why a Data Descriptor may be rejected Reject without review • Out of scope or no data present Reject after review • Serious flaws in the study design, e.g. lack of crucial controls • Serious issues identified in the data files by the peer reviewers After rejection • Address concerns and resubmit to Scientific Data • Resubmit to another data journal • Withdraw data from Scientific Data integrated repositories Data should be technically reliable and suitable for use by others
  • 18. 1717 Ensuring your data is peer review ready
  • 19. 18 Create a data management plan • Can avoid problems later • Increasingly required by funders • Critically evaluate existing practices – you may be setting standards for your field • Some aspects of best practice may incur costs • Find people and resources that can help you Datasets CodeMetadataResearch paper Nature Genetics
  • 20. 19 Archive your data to the most appropriate repository We currently list around 90 repositories, across biological, medical, physical and social sciences www.nature.com/sdata/policies/repositories Considerations: 1. Is there a discipline or data-specific repository for your data? 2. If no discipline or data-specific repository for your data exists, does your funder or institution mandate deposition to a particular repository?
  • 21. 20 Spot the mistakes Unhelpful document name Formatting used to convey information Special characters can cause text mining errors Meaningless column titles Undefined abbreviation No units are given
  • 22. 21 Increasing intelligibility Self-explanatory document name Removed cell formatting Removed special characters Meaningful column titles Defined ‘BUN’
  • 23. 22 Increasing assessability Information which was asterisked is now added to results section Added Units column
  • 24. 23 Increasing re-usability Additional information to be added to methods section or table legend
  • 25. 24 Increasing reproducibility • Include any additional information needed to understand the data, methods, parameters, e.g. which instrument (make and model) was used to measure blood carbon dioxide levels? • Include availability statements for any code that was used to view, parse or analyse the data, in support of the conclusions.
  • 27. 2626 What happens when data is shared well?
  • 28. 27 Data reuse by other researchers in the same field 2 “The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.” Professor Daniele Marinazzo
  • 30. 29 Data reuse by the non-research community 2 http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
  • 31. 30 Data peer review at Scientific Data Data Archive • Checked multiple times • Scientific reasoning underlying data reviewed by active researchers • Technical validity reviewed by discipline experts Data Citations • Citation accuracy confirmed by specialist editor • Citation format checked by editorial team • Data linkage tested by production team Data Peer Review • Does not have to be onerous • Can save overall reviewing time • Results in data that is reusable and useful!
  • 32. 3131 Thank you! Visit nature.com/scientificdata Email scientificdata@nature.com Tweet @ScientificData