SlideShare a Scribd company logo
Public Data Archiving in Ecology and Evolution
How well are we doing?
Dr. Sandra A. Binning
@binsan5
Public Data Archiving in Ecology and Evolution: How well are we doing?
Are publications the only useful research output?
What about DATA?
Do scientists have an
obligation to make their
data freely available?
Big push in the biological sciences for
Public Data Archiving
“The data and its analysis are the scientific product.
The paper is just an advertisement.”
Richard McElreath
McElreath R (2016) Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press: 469 pp
What is Public Data Archiving?
(Figure from Reichman et al 2011 Science)
The process of storing data
and associated metadata in a
repository that is open to the
public and where data can be
accessed and downloaded
freely by a third party.
Why do it?
• avoids data loss from hardware malfunction/obsolescence or from researchers moving on
to different projects or retiring
• encourages good metadata production to ensure that datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
(Huang & Qiao 2011 TREE, Molloy 2011 PLOS Biol, Piwowar et al 2011 Nature, Reichman et al
2011 Science, Tenopir et al 2011 PLOS One, Whitlock 2011 TREE, Whitlock et al 2010 Am Nat)
Most research is paid for by…..
Data as a public good?
TAXPAYERS
in the form of government grants and salaries
So, who really “owns” the data?
Joint Data Archiving Policy (JDAP)
http://datadryad.org/pages/jdap
Journals that require data archiving
Examples:
•The American Naturalist
•Biological Journal of the Linnean Society
•Biology Letters
•BMC Ecology
•BMC Evolutionary Biology
•BMJ
•BMJ Open
•Ecological Applications
•Ecological Monographs
•Ecology
•Ecosphere
•Evolution
•Evolutionary Applications
•Frontiers in Ecology and the Environment
•Functional Ecology
•Genetics
•Heredity
… http://datadryad.org/pages/jdap
Data archiving trends in Ecology & Evolution?
Data deposition has increased considerably
in Dryad and other repositories.
(Vision 2013 figshare)
Members of the JDAP consortium have
tripled since its inception in 2011.
(Magee et al 2014 PLOS One)
Enforcing Public Data Archiving policies has had a positive effect on data deposition rates.
(Vines et al 2013 FASEB Journal, Magee et al 2014 PLOS One)
The problem…
Many researchers harbour concerns about making their data publicly available.
This is particularly true in fields such as ecology and evolutionary biology, where datasets are
often complex, have a long shelf life, and can be used to test multiple hypotheses.
Why are researchers reluctant to archive/share their data?
• Proper data archiving takes time (away from publishing).
• Competition for publications - fear of being “scooped”.
• Concerns about data misinterpretation / misuse.
• Lack of recognition for Public Data Archiving.
Benefits vs. Costs
• avoids data loss from hardware
malfunction/obsolescence or from researchers moving
on to different projects or retiring
• encourages good metadata production to ensure that
datasets are interpretable
• increases the ability to evaluate and reproduce studies
• increases opportunities for teaching and learning
• encourages a stronger sharing culture
• improves the return per research dollar
• increased citations and collaborations
• funded by taxpayers
Good for scientific
community
But costs are to
individual
researchers
“63% of PIs were against PDA as currently required”
“41% of respondents said that they have avoided
publishing in journals that require [PDA]”
“53% intend to avoid publishing in [journals requiring
PDA] in the future”
“A key concern is that [PDA] will be a disincentive
both for the initiation of long-term studies, and for
maintenance of ongoing studies.”
Are we filling up ‘empty archives’?
(Nelson 2009 Nature)
Most journals and databases don’t verify the quality of archived data beyond
basic checks like ensuring that a data availability statement and a valid DOI
number are provided in the paper.
(Noor et al 2006 PLOS Biol, Costello et al 2013 TREE)
What’s happening in molecular biology?
It’s not looking good…
1) Ioannidis et al 2008 Nat Gen:
Review of microarray studies :
- only 2 of 18 were reproducible
2) Gilbert et al 2014 Mol Ecol:
Review of pop genetics studies:
- 30% of analyses irreproducible
- 35% of datasets insufficiently
described
Public Data Archiving in Ecology and Evolution: How well are we doing?
PDA in E&E – how well are we doing?
We assessed 100 non-molecular studies in journals either have adopted the Joint
Data Archiving Policy (JDAP) or have a strong data archiving policy.
Completeness criterion
Reusability criterion
Joint Data Archiving Policy (JDAP)
“data supporting the results in the paper should be archived in an appropriate public archive”
http://datadryad.org/pages/jdap
Data completeness score
Meets JDAP requirements
Does not meet JDAP requirements (Roche et al 2015; PLOS Biol)
Data reusability score
(Roche et al 2015; PLOS Biol)
Bad archiving examples
• SPSS files archived
• Files archived in language other than English with no metadata
• Too much data!
• Only data (no description)
• Principle components without raw data
Data completeness - results
More than half (56%) of studies did not meet the minimum
requirement of JDAP or strong archiving policies
passfail
(Roche et al 2015; PLOS Biol)
Data reusability - results
passfail
Even more (64%) of studies were archived in a way that partially
or entirely prevented reuse (Roche et al 2015; PLOS Biol)
How do we increase high quality participation?
How do we increase participation?
1. Encourage communication between data generators and re-users
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
1. Encourage communication between data generators and re-users
2. Disclose data re-use ethics
3. Encourage increased recognition of publicly archived data
4. Facilitate more flexible embargoes on archived data
(Roche et al 2014 PLOS Biol)
How do we increase high quality participation?
• Be mindful of PDA
• Provide detailed metadata
• Use descriptive file names
• Archive unprocessed data
• Use standard file formats (i.e. .txt, .csv)
• Facilitate data aggregation
• Perform quality control
How do we increase high quality participation?
Key recommendations to improve PDA practices
Public Data Archiving: The way forward?
• Not everyone is on board
• “Empty archives” are a problem in E&E
• Willful omission
• Lack of knowledge
• Solutions
• Acknowledge fears and try to alleviate them
• Enforcement, reward, flexibility
• Educate researchers as to best practices
• Recognize individual efforts to increase transparency
Many thanks to Ainsley Seago, Luke Holman, Scott Keogh, Pat
Backwell, Andrew Cockburn, Todd Vision, Mark Hahnel, the
Evolutionary Ecology Reading group at the Australian National
University and the Eco-Ethology and Cognitive Sciences lab
groups at the University of Neuchatel.
Image / illustration credits: A. Seago, Google@binsan5

More Related Content

PDF
Roche_open_science_NIOO_KNAW_workshop_NL
PPTX
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
PDF
RDA Scholarly Infrastructure 2015
PDF
Reproducible research: First steps.
PPTX
Share & Flourish workshop, Leiden, August 2014
PPTX
Reproducibility and Scientific Research: why, what, where, when, who, how
PPSX
Rii stock centerdir_aug9_2016
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Roche_open_science_NIOO_KNAW_workshop_NL
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
RDA Scholarly Infrastructure 2015
Reproducible research: First steps.
Share & Flourish workshop, Leiden, August 2014
Reproducibility and Scientific Research: why, what, where, when, who, how
Rii stock centerdir_aug9_2016
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...

What's hot (20)

PDF
OA Week 2012 Miami U: How Open Scholarship is Changing Research
PDF
Developing a Replicable Methodology for Automated Identification of Emerging ...
PPTX
Genome sharing projects around the world nijmegen oct 29 - 2015
PPTX
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
PDF
Practical challenges for researchers in data sharing
PPTX
Ethics and Stem Cells
PDF
Ethics, Research & Society
PDF
Considerations and challenges in building an end to-end microbiome workflow
PPTX
Bioinformatics in the Era of Open Science and Big Data
PDF
References on Reproducibility Crisis in Science by D.V.M. Bishop
DOCX
Bishop reproducibility references nov2016
PPTX
Global Dementia Legacy Event: Dr Neil Buckholtz
PPTX
ischools future of data managemente dec2017
PDF
Data citation metrics : best practice to enable new metrics for research data
PDF
Thesis Proposal, as presented for dissertation proposal defense
PPTX
Responsible Conduct of Research
PDF
Inglis Preprints in Biology and Medicine
PPTX
Reproducible research: theory
PPTX
Collaborative Research: Scopus & RefWorks
PPT
Open Notebook Science in Drug Discovery
OA Week 2012 Miami U: How Open Scholarship is Changing Research
Developing a Replicable Methodology for Automated Identification of Emerging ...
Genome sharing projects around the world nijmegen oct 29 - 2015
2014 CrossRef Annual Meeting Keynote: Ways and Needs to Promote Rapid Data Sh...
Practical challenges for researchers in data sharing
Ethics and Stem Cells
Ethics, Research & Society
Considerations and challenges in building an end to-end microbiome workflow
Bioinformatics in the Era of Open Science and Big Data
References on Reproducibility Crisis in Science by D.V.M. Bishop
Bishop reproducibility references nov2016
Global Dementia Legacy Event: Dr Neil Buckholtz
ischools future of data managemente dec2017
Data citation metrics : best practice to enable new metrics for research data
Thesis Proposal, as presented for dissertation proposal defense
Responsible Conduct of Research
Inglis Preprints in Biology and Medicine
Reproducible research: theory
Collaborative Research: Scopus & RefWorks
Open Notebook Science in Drug Discovery
Ad

Similar to Public Data Archiving in Ecology and Evolution: How well are we doing? (20)

PPTX
The Dryad Digital Repository: Published evolutionary data as part of the gre...
PDF
The Dryad Digital Repository: Published data as part of the greater data ecos...
PPTX
Data sharing archiving discovery, Bill Michener
PDF
GBIF BIFA mentoring, Day 5a Data management, July 2016
PPT
Data curation issues for repositories
PPTX
METRO RDM Webinar
PPTX
Data Literacy: Creating and Managing Reserach Data
PDF
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
PPTX
How and Why to Share Your Data
PPTX
Research data and scholarly publications: going from casual acquaintances to ...
PDF
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
PDF
Knowledge Exchange, Nov 2011, Bonn
PDF
Prototype Phase Kick-off Event and Ceremony
PPTX
Data Management and Horizon 2020
PDF
How to overcome obstacles to data publication: Issues, requirements, and good...
PDF
Data hosting infrastructure for primary biodiversity data
PPT
BioMed Central's open data initiatives
PPTX
The Ethics of Digital Preservation
PPTX
Research data life cycle
PDF
ANDS presentation at AHMEN meeting 6 June 2016
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published data as part of the greater data ecos...
Data sharing archiving discovery, Bill Michener
GBIF BIFA mentoring, Day 5a Data management, July 2016
Data curation issues for repositories
METRO RDM Webinar
Data Literacy: Creating and Managing Reserach Data
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
How and Why to Share Your Data
Research data and scholarly publications: going from casual acquaintances to ...
OpenAIRE webinar: Principles of Research Data Management, with S. Venkatarama...
Knowledge Exchange, Nov 2011, Bonn
Prototype Phase Kick-off Event and Ceremony
Data Management and Horizon 2020
How to overcome obstacles to data publication: Issues, requirements, and good...
Data hosting infrastructure for primary biodiversity data
BioMed Central's open data initiatives
The Ethics of Digital Preservation
Research data life cycle
ANDS presentation at AHMEN meeting 6 June 2016
Ad

Recently uploaded (20)

PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
Sciences of Europe No 170 (2025)
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Cell Membrane: Structure, Composition & Functions
PPT
protein biochemistry.ppt for university classes
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
famous lake in india and its disturibution and importance
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
2. Earth - The Living Planet earth and life
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
An interstellar mission to test astrophysical black holes
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Sciences of Europe No 170 (2025)
The KM-GBF monitoring framework – status & key messages.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Cell Membrane: Structure, Composition & Functions
protein biochemistry.ppt for university classes
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
microscope-Lecturecjchchchchcuvuvhc.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
famous lake in india and its disturibution and importance
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
2. Earth - The Living Planet earth and life
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
AlphaEarth Foundations and the Satellite Embedding dataset
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
An interstellar mission to test astrophysical black holes

Public Data Archiving in Ecology and Evolution: How well are we doing?

  • 1. Public Data Archiving in Ecology and Evolution How well are we doing? Dr. Sandra A. Binning @binsan5
  • 3. Are publications the only useful research output?
  • 4. What about DATA? Do scientists have an obligation to make their data freely available? Big push in the biological sciences for Public Data Archiving “The data and its analysis are the scientific product. The paper is just an advertisement.” Richard McElreath McElreath R (2016) Statistical rethinking: A Bayesian course with examples in R and Stan. CRC Press: 469 pp
  • 5. What is Public Data Archiving? (Figure from Reichman et al 2011 Science) The process of storing data and associated metadata in a repository that is open to the public and where data can be accessed and downloaded freely by a third party.
  • 6. Why do it? • avoids data loss from hardware malfunction/obsolescence or from researchers moving on to different projects or retiring • encourages good metadata production to ensure that datasets are interpretable • increases the ability to evaluate and reproduce studies • increases opportunities for teaching and learning • encourages a stronger sharing culture • improves the return per research dollar • increased citations and collaborations (Huang & Qiao 2011 TREE, Molloy 2011 PLOS Biol, Piwowar et al 2011 Nature, Reichman et al 2011 Science, Tenopir et al 2011 PLOS One, Whitlock 2011 TREE, Whitlock et al 2010 Am Nat)
  • 7. Most research is paid for by….. Data as a public good? TAXPAYERS in the form of government grants and salaries So, who really “owns” the data?
  • 8. Joint Data Archiving Policy (JDAP) http://datadryad.org/pages/jdap
  • 9. Journals that require data archiving Examples: •The American Naturalist •Biological Journal of the Linnean Society •Biology Letters •BMC Ecology •BMC Evolutionary Biology •BMJ •BMJ Open •Ecological Applications •Ecological Monographs •Ecology •Ecosphere •Evolution •Evolutionary Applications •Frontiers in Ecology and the Environment •Functional Ecology •Genetics •Heredity … http://datadryad.org/pages/jdap
  • 10. Data archiving trends in Ecology & Evolution? Data deposition has increased considerably in Dryad and other repositories. (Vision 2013 figshare) Members of the JDAP consortium have tripled since its inception in 2011. (Magee et al 2014 PLOS One) Enforcing Public Data Archiving policies has had a positive effect on data deposition rates. (Vines et al 2013 FASEB Journal, Magee et al 2014 PLOS One)
  • 11. The problem… Many researchers harbour concerns about making their data publicly available. This is particularly true in fields such as ecology and evolutionary biology, where datasets are often complex, have a long shelf life, and can be used to test multiple hypotheses.
  • 12. Why are researchers reluctant to archive/share their data? • Proper data archiving takes time (away from publishing). • Competition for publications - fear of being “scooped”. • Concerns about data misinterpretation / misuse. • Lack of recognition for Public Data Archiving.
  • 13. Benefits vs. Costs • avoids data loss from hardware malfunction/obsolescence or from researchers moving on to different projects or retiring • encourages good metadata production to ensure that datasets are interpretable • increases the ability to evaluate and reproduce studies • increases opportunities for teaching and learning • encourages a stronger sharing culture • improves the return per research dollar • increased citations and collaborations • funded by taxpayers Good for scientific community But costs are to individual researchers
  • 14. “63% of PIs were against PDA as currently required” “41% of respondents said that they have avoided publishing in journals that require [PDA]” “53% intend to avoid publishing in [journals requiring PDA] in the future” “A key concern is that [PDA] will be a disincentive both for the initiation of long-term studies, and for maintenance of ongoing studies.”
  • 15. Are we filling up ‘empty archives’? (Nelson 2009 Nature) Most journals and databases don’t verify the quality of archived data beyond basic checks like ensuring that a data availability statement and a valid DOI number are provided in the paper. (Noor et al 2006 PLOS Biol, Costello et al 2013 TREE)
  • 16. What’s happening in molecular biology? It’s not looking good… 1) Ioannidis et al 2008 Nat Gen: Review of microarray studies : - only 2 of 18 were reproducible 2) Gilbert et al 2014 Mol Ecol: Review of pop genetics studies: - 30% of analyses irreproducible - 35% of datasets insufficiently described
  • 18. PDA in E&E – how well are we doing? We assessed 100 non-molecular studies in journals either have adopted the Joint Data Archiving Policy (JDAP) or have a strong data archiving policy. Completeness criterion Reusability criterion
  • 19. Joint Data Archiving Policy (JDAP) “data supporting the results in the paper should be archived in an appropriate public archive” http://datadryad.org/pages/jdap
  • 20. Data completeness score Meets JDAP requirements Does not meet JDAP requirements (Roche et al 2015; PLOS Biol)
  • 21. Data reusability score (Roche et al 2015; PLOS Biol)
  • 22. Bad archiving examples • SPSS files archived • Files archived in language other than English with no metadata • Too much data! • Only data (no description) • Principle components without raw data
  • 23. Data completeness - results More than half (56%) of studies did not meet the minimum requirement of JDAP or strong archiving policies passfail (Roche et al 2015; PLOS Biol)
  • 24. Data reusability - results passfail Even more (64%) of studies were archived in a way that partially or entirely prevented reuse (Roche et al 2015; PLOS Biol)
  • 25. How do we increase high quality participation?
  • 26. How do we increase participation?
  • 27. 1. Encourage communication between data generators and re-users (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 28. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 29. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics 3. Encourage increased recognition of publicly archived data (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 30. 1. Encourage communication between data generators and re-users 2. Disclose data re-use ethics 3. Encourage increased recognition of publicly archived data 4. Facilitate more flexible embargoes on archived data (Roche et al 2014 PLOS Biol) How do we increase high quality participation?
  • 31. • Be mindful of PDA • Provide detailed metadata • Use descriptive file names • Archive unprocessed data • Use standard file formats (i.e. .txt, .csv) • Facilitate data aggregation • Perform quality control How do we increase high quality participation? Key recommendations to improve PDA practices
  • 32. Public Data Archiving: The way forward? • Not everyone is on board • “Empty archives” are a problem in E&E • Willful omission • Lack of knowledge • Solutions • Acknowledge fears and try to alleviate them • Enforcement, reward, flexibility • Educate researchers as to best practices • Recognize individual efforts to increase transparency
  • 33. Many thanks to Ainsley Seago, Luke Holman, Scott Keogh, Pat Backwell, Andrew Cockburn, Todd Vision, Mark Hahnel, the Evolutionary Ecology Reading group at the Australian National University and the Eco-Ethology and Cognitive Sciences lab groups at the University of Neuchatel. Image / illustration credits: A. Seago, Google@binsan5