SlideShare a Scribd company logo
What is
Reproducibility?
The R* brouhaha
(and how Research Objects
can help)
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute, UK
ELIXIR-UK, FAIRDOMAssociation e.V.
carole.goble@manchester.ac.uk
First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
Acknowledgements
• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041
• ATI Symposium Reproducibility, Sustainability and Preservation , April 2016
– https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/
– https://osf.io/bcef5/files/
• CTitus Brown
• Juliana Freire
• David De Roure
• Stian Soiland-Reyes
• Barend Mons
• Tim Clark
• Daniel Garijo
• Norman Morrison
“When I use a word," Humpty Dumpty
said in rather a scornful tone, "it means
just what I choose it to mean - neither
more nor less.”
Carroll, Through the Looking Glass
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
redo
robustness
tolerance
verificationcompliancevalidation assurance
remix
Reproducibility of
Reproducibility Research
Computational Science
http://tpeterka.github.io/maui-project/
From:The Future of ScientificWorkflows, Report of DOEWorkshop 2015,
http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
1. Observational,
experimental
2. Theoretical
3. Simulation
4. Data intensive
BioSTIF
Computational
Science
Scientific publications goals:
(i) announce a result
(ii) convince readers its correct.
Papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension.
Papers in computational science
should describe the results and
provide the complete software
development environment, data
and set of instructions which
generated the figures.
VirtualWitnessing*
*Leviathan and theAir-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
Jill Mesirov
David Donoho
Datasets, Data collections
Standard operating procedures
Software, algorithms
Configurations,
Tools and apps, services
Codes, code libraries
Workflows, scripts
System software
Infrastructure
Compilers, hardware
Systems of
Systems
Heterogeneous hybrid
patchwork of tools and
service evolving over time
10 “Simple” Rules for Reproducible
Computational Research: RACE
1. For Every Result, Keep Track of How It Was
Produced
2. Avoid Manual Data Manipulation Steps
3. Archive the Exact Versions of All External
Programs Used
4. Version Control All Custom Scripts
5. Record All Intermediate Results, When Possible in
Standardized Formats
6. For Analyses That Include Randomness, Note
Underlying Random Seeds
7. Always Store Raw Data behind Plots
8. Generate Hierarchical Analysis Output, Allowing
Layers of Increasing Detail to Be Inspected
9. Connect Textual Statements to Underlying
Results
10. Provide Public Access to Scripts, Runs, and
Results
Sandve GK, Nekrutenko A,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible
Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285
Record
Everything
Automate
Everything
Contain
Everything
Expose
Everything
Preparation pain
independent testing trials and tribulations
[Norman Morrison]
replication hostility no funding, time, recognition, place to publish
resource intensive access to the complete environment
Lab Analogy: Witnessing “Datascopes”
Input Data
Software
Output Data
Config
Parameters
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, , ref resources
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment
“Micro” Reproducibility
“Macro” Reproducibility
Fixivity
Validate
Verify
Trust
Repeat, Replicate, Robust
[CTitus Brown]
https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html
Why the differences?
Reproduce,Trust
“an experiment is reproducible until
another laboratory tries to repeat it”
Alexander Kohn
Repeatability:
“Sameness”
Same result
1 Lab
1 experiment
Reproducibility:
“Similarity”
Similar result
> 1 Lab
> 1 experiment
Validate
Verify
Method
Reproducibility
the provision of
enough detail about
study procedures and
data so the same
procedures could, in
theory or in actuality,
be exactly repeated.
Result Reproducibility
(aka replicability)
obtaining the same
results from the
conduct of an
independent study
whose procedures are
as closely matched to
the original experiment
as possible
What does research reproducibility mean? Steven N. Goodman, Daniele Fanelli, John
P. A. Ioannidis ScienceTranslational Medicine 8 (341), 341ps12.
[doi: 10.1126/scitranslmed.aaf5027]
http://stm.sciencemag.org/content/scitransmed/8/341/341ps12.full.pdf
Productivity
Track differences
Validate
Verify
reviewers want additional work
statistician wants more runs
analysis needs to be repeated
post-doc leaves,
student arrives
new/revised datasets
updated/new versions of
algorithms/codes
sample was contaminated
better kit - longer simulations
new partners, new projects
Personal & Lab
Productivity
Public Good
Reproducibility
“Datascope” Lab Analogy
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, ref datasets
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment
“Datascope” Lab Analogy
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, ref datasets
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment
Form
Function
“Datascope” Practicalities
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, ref datasets
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment
Living Dependencies
Science,
methods,
datasets
questions stay,
answers change
breakage, labs
decay, services and
techniques come
and go, new
instruments,
updated datasets,
services, codes,
hardware
One offs, streams,
stochastics,
sensitivities,
scale, non-portable
data
black boxes
supercomputer
access
non-portable
software
licensing restrictions
unreliable resources
black boxes
complexity
T1 T2
evolving ref datasets,
new simulation codes
Environment
Archived vs Active
Contained vs Distributed
Regimented vs Free-for-all
Who owns the dependencies?
Dependencies -> Manage
Black boxes -> Expose
Dynamics -> Fixity
Reliability
Replicate harder than Reproduce?
Repeating the experiment or the set up?
Container Conundrum Results willVary
ReplicabilityWindow
All experiments become less replicable over time
Prepare to repair
Levels of Computational Reproducibility
Coverage: how
much of an
experiment is
reproducible
OriginalExperimentSimilarExperimentDifferentExperiment
Portability
Depth: how much of an experiment is available
Binaries +
Data
Source Code /
Workflow
+ Data
Binaries +
Data +
Dependencies
Source Code /
Workflow
+ Data +
Dependencies
Virtual Machine
Binaries +
Data +
Dependencies
Virtual Machine
Source Code /
Workflow
+ Data +
Dependencies
Figures +
Data
[Freire, 2014]
Minimum:
data and source
code available
under terms
that permit
inspection and
execution.
Measuring Information Gain from Reproducibility
Research goal
Method/Alg.
Platform/Exec Env
Data Parameters
Input data
Actors
Information Gain
Implementation/Code
No change
Change
Don’t care
https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/
http://www.dagstuhl.de/16041
How? Preserve by Reporting, Reproduce by Reading
Archived
Record
Description Zoo
standards, common metadata
How? Preserve by Maintaining, Repairing, Containing
Reproduce by Running, Emulating, Reconstructing
Active Instrument Byte level Buildability Zoo
provenance
portability, preservation
robustness, versioning
access description
standards
common APIs
licensing, identifiers
standards,
common metadata
change
variation sensitivity
discrepancy handling
packaging, containers
FAIR RACE Reproducibility Dimensions
dependencies
steps
Research Object
Standards-based metadata framework for logically and
physically bundling resources with context,
http://researchobject.org
Bigger on the inside than the outside
external referencing
Manifest
Construction
Aggregates
link things together
Annotations
about things & their
relationships
Container
Research Object Standards-based metadata framework for logically
and physically bundling resources with context, http://researchobject.org
Packaging content & links:
Zip files, BagIt, Docker images
Catalogues & Commons Platforms:
FAIRDOM
Manifest
Description
Dependencies
what else is
needed
Versioning
its evolution
Checklists
what should
be there
Provenance
where it
came from
Identification
locate things
regardless where
id
Systems Biology
Commons
• Link data, models
and SOPs
• Standards
• Span resources
• Snapshot + DOIs
• Bundle and export
• Logical bundles
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:10.1016/j.websem.2015.01.003
application/vnd.wf4ever.robundle+zip
Workflow Research Objects
exchange, portability and
maintenance
*https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
Asthma Research e-Lab
Dataset building and
releasing
Standardised
packing of Systems
Biology models
European Space
Agency RO Library
Large dataset
management for life
science workflows
LHC ATLAS
experiments
Notre Dame U Rostock
Encyclopedia of DNA
Elements
PeptideAtlas
Words matter.
Reproducibility is not a end.
Its a means to an end.
Beware reproducibility zealots.
50 Shades of Reproducibility.
form vs function
A conundrum:
big co-operative data-driven
science makes reproducibility
desirable but also means
dependency and change are to be
expected.
Lab analogy for
computational science
Bonus Slides

More Related Content

PPTX
FAIR History and the Future
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PPTX
FAIR Workflows and Research Objects get a Workout
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
RO-Crate: A framework for packaging research products into FAIR Research Objects
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
PPTX
The swings and roundabouts of a decade of fun and games with Research Objects
PPTX
Better software, better service, better research: The Software Sustainabilit...
FAIR History and the Future
FAIRy stories: the FAIR Data principles in theory and in practice
FAIR Workflows and Research Objects get a Workout
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
RO-Crate: A framework for packaging research products into FAIR Research Objects
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
The swings and roundabouts of a decade of fun and games with Research Objects
Better software, better service, better research: The Software Sustainabilit...

What's hot (20)

PPTX
Being Reproducible: SSBSS Summer School 2017
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
Better Software, Better Research
PPTX
Introduction to FAIRDOM
PPTX
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
PPTX
Open Science: how to serve the needs of the researcher?
PPTX
Being FAIR: Enabling Reproducible Data Science
PPTX
FAIR Computational Workflows
PPTX
FAIR Computational Workflows
PPTX
Building the FAIR Research Commons: A Data Driven Society of Scientists
PPTX
How are we Faring with FAIR? (and what FAIR is not)
PPT
DCC Keynote 2007
PPTX
FAIR Computational Workflows
PPTX
Advances in Scientific Workflow Environments
PPTX
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
PPTX
Let’s go on a FAIR safari!
PPTX
FAIRer Research
PPTX
Reproducible Research: how could Research Objects help
PPTX
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
PPTX
A Big Picture in Research Data Management
Being Reproducible: SSBSS Summer School 2017
Reproducibility (and the R*) of Science: motivations, challenges and trends
Better Software, Better Research
Introduction to FAIRDOM
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
Open Science: how to serve the needs of the researcher?
Being FAIR: Enabling Reproducible Data Science
FAIR Computational Workflows
FAIR Computational Workflows
Building the FAIR Research Commons: A Data Driven Society of Scientists
How are we Faring with FAIR? (and what FAIR is not)
DCC Keynote 2007
FAIR Computational Workflows
Advances in Scientific Workflow Environments
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Let’s go on a FAIR safari!
FAIRer Research
Reproducible Research: how could Research Objects help
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
A Big Picture in Research Data Management
Ad

Similar to What is Reproducibility? The R* brouhaha and how Research Objects can help (20)

PPT
Results may vary: Collaborations Workshop, Oxford 2014
PPTX
Reproducibility, Research Objects and Reality, Leiden 2016
PPT
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
PDF
Open reproducible research
PDF
Digital Scholar Webinar: Open reproducible research
PPTX
Research Objects for FAIRer Science
PDF
Deep Software Variability and Frictionless Reproducibility
PDF
Reproducibility by Other Means: Transparent Research Objects
PPTX
Reproducibility and Scientific Research: why, what, where, when, who, how
PPTX
Keynote speech - Carole Goble - Jisc Digital Festival 2015
PPTX
RARE and FAIR Science: Reproducibility and Research Objects
PDF
Open & reproducible research - What can we do in practice?
PPTX
Reproducibility
PPTX
Intro to Reproducible Research
PPT
Importance and Challenges of Reproducible Research
PDF
Reading Group 2014
PDF
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
PPTX
Reproducible research: theory
PPT
Reproducibility challenges in computational settings: what are they, why shou...
PPTX
sicsa-phd2016
Results may vary: Collaborations Workshop, Oxford 2014
Reproducibility, Research Objects and Reality, Leiden 2016
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Open reproducible research
Digital Scholar Webinar: Open reproducible research
Research Objects for FAIRer Science
Deep Software Variability and Frictionless Reproducibility
Reproducibility by Other Means: Transparent Research Objects
Reproducibility and Scientific Research: why, what, where, when, who, how
Keynote speech - Carole Goble - Jisc Digital Festival 2015
RARE and FAIR Science: Reproducibility and Research Objects
Open & reproducible research - What can we do in practice?
Reproducibility
Intro to Reproducible Research
Importance and Challenges of Reproducible Research
Reading Group 2014
Computational Reproducibility vs. Transparency: Is It FAIR Enough?
Reproducible research: theory
Reproducibility challenges in computational settings: what are they, why shou...
sicsa-phd2016
Ad

More from Carole Goble (15)

PPTX
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
PPTX
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
PPTX
RO-Crate: packaging metadata love notes into FAIR Digital Objects
PPTX
Research Software Sustainability takes a Village
PPTX
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
PPTX
FAIR Computational Workflows
PPTX
Open Research: Manchester leading and learning
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PPTX
FAIR Computational Workflows
PPTX
EOSC-Life Workflow Collaboratory
PPTX
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
PPTX
ELIXIR UK Node presentation to the ELIXIR Board
PPTX
FAIRy stories: tales from building the FAIR Research Commons
PPTX
Reflections on a (slightly unusual) multi-disciplinary academic career
PPTX
Research Object Community Update
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Research Software Sustainability takes a Village
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
FAIR Computational Workflows
Open Research: Manchester leading and learning
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
FAIR Computational Workflows
EOSC-Life Workflow Collaboratory
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
ELIXIR UK Node presentation to the ELIXIR Board
FAIRy stories: tales from building the FAIR Research Commons
Reflections on a (slightly unusual) multi-disciplinary academic career
Research Object Community Update

Recently uploaded (20)

PPT
Chemical bonding and molecular structure
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
The scientific heritage No 166 (166) (2025)
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
INTRODUCTION TO EVS | Concept of sustainability
Chemical bonding and molecular structure
The KM-GBF monitoring framework – status & key messages.pptx
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
microscope-Lecturecjchchchchcuvuvhc.pptx
Phytochemical Investigation of Miliusa longipes.pdf
ECG_Course_Presentation د.محمد صقران ppt
7. General Toxicologyfor clinical phrmacy.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
bbec55_b34400a7914c42429908233dbd381773.pdf
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
The scientific heritage No 166 (166) (2025)
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Derivatives of integument scales, beaks, horns,.pptx
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Cell Membrane: Structure, Composition & Functions
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
INTRODUCTION TO EVS | Concept of sustainability

What is Reproducibility? The R* brouhaha and how Research Objects can help

  • 1. What is Reproducibility? The R* brouhaha (and how Research Objects can help) Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK ELIXIR-UK, FAIRDOMAssociation e.V. carole.goble@manchester.ac.uk First International Workshop on Reproducible Open Science @ TPDL, 9 Sept 2016, Hannover, Germany
  • 2. Acknowledgements • Dagstuhl Seminar 16041 , January 2016 – http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041 • ATI Symposium Reproducibility, Sustainability and Preservation , April 2016 – https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/ – https://osf.io/bcef5/files/ • CTitus Brown • Juliana Freire • David De Roure • Stian Soiland-Reyes • Barend Mons • Tim Clark • Daniel Garijo • Norman Morrison
  • 3. “When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.” Carroll, Through the Looking Glass re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle redo robustness tolerance verificationcompliancevalidation assurance remix
  • 5. Computational Science http://tpeterka.github.io/maui-project/ From:The Future of ScientificWorkflows, Report of DOEWorkshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd 1. Observational, experimental 2. Theoretical 3. Simulation 4. Data intensive
  • 7. Scientific publications goals: (i) announce a result (ii) convince readers its correct. Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension. Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures. VirtualWitnessing* *Leviathan and theAir-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer. Jill Mesirov David Donoho
  • 8. Datasets, Data collections Standard operating procedures Software, algorithms Configurations, Tools and apps, services Codes, code libraries Workflows, scripts System software Infrastructure Compilers, hardware Systems of Systems Heterogeneous hybrid patchwork of tools and service evolving over time
  • 9. 10 “Simple” Rules for Reproducible Computational Research: RACE 1. For Every Result, Keep Track of How It Was Produced 2. Avoid Manual Data Manipulation Steps 3. Archive the Exact Versions of All External Programs Used 4. Version Control All Custom Scripts 5. Record All Intermediate Results, When Possible in Standardized Formats 6. For Analyses That Include Randomness, Note Underlying Random Seeds 7. Always Store Raw Data behind Plots 8. Generate Hierarchical Analysis Output, Allowing Layers of Increasing Detail to Be Inspected 9. Connect Textual Statements to Underlying Results 10. Provide Public Access to Scripts, Runs, and Results Sandve GK, Nekrutenko A,Taylor J, Hovig E (2013)Ten Simple Rules for Reproducible Computational Research. PLoS Comput Biol 9(10): e1003285. doi:10.1371/journal.pcbi.1003285 Record Everything Automate Everything Contain Everything Expose Everything
  • 10. Preparation pain independent testing trials and tribulations [Norman Morrison] replication hostility no funding, time, recognition, place to publish resource intensive access to the complete environment
  • 11. Lab Analogy: Witnessing “Datascopes” Input Data Software Output Data Config Parameters Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, , ref resources Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment
  • 13. Repeat, Replicate, Robust [CTitus Brown] https://2016-oslo-repeatability.readthedocs.org/en/latest/repeatability-discussion.html Why the differences? Reproduce,Trust
  • 14. “an experiment is reproducible until another laboratory tries to repeat it” Alexander Kohn Repeatability: “Sameness” Same result 1 Lab 1 experiment Reproducibility: “Similarity” Similar result > 1 Lab > 1 experiment Validate Verify
  • 15. Method Reproducibility the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility (aka replicability) obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible What does research reproducibility mean? Steven N. Goodman, Daniele Fanelli, John P. A. Ioannidis ScienceTranslational Medicine 8 (341), 341ps12. [doi: 10.1126/scitranslmed.aaf5027] http://stm.sciencemag.org/content/scitransmed/8/341/341ps12.full.pdf
  • 17. reviewers want additional work statistician wants more runs analysis needs to be repeated post-doc leaves, student arrives new/revised datasets updated/new versions of algorithms/codes sample was contaminated better kit - longer simulations new partners, new projects Personal & Lab Productivity Public Good Reproducibility
  • 18. “Datascope” Lab Analogy Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment
  • 19. “Datascope” Lab Analogy Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment Form Function
  • 20. “Datascope” Practicalities Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment Living Dependencies Science, methods, datasets questions stay, answers change breakage, labs decay, services and techniques come and go, new instruments, updated datasets, services, codes, hardware One offs, streams, stochastics, sensitivities, scale, non-portable data black boxes supercomputer access non-portable software licensing restrictions unreliable resources black boxes complexity
  • 21. T1 T2 evolving ref datasets, new simulation codes Environment Archived vs Active Contained vs Distributed Regimented vs Free-for-all Who owns the dependencies? Dependencies -> Manage Black boxes -> Expose Dynamics -> Fixity Reliability
  • 22. Replicate harder than Reproduce? Repeating the experiment or the set up? Container Conundrum Results willVary ReplicabilityWindow All experiments become less replicable over time Prepare to repair
  • 23. Levels of Computational Reproducibility Coverage: how much of an experiment is reproducible OriginalExperimentSimilarExperimentDifferentExperiment Portability Depth: how much of an experiment is available Binaries + Data Source Code / Workflow + Data Binaries + Data + Dependencies Source Code / Workflow + Data + Dependencies Virtual Machine Binaries + Data + Dependencies Virtual Machine Source Code / Workflow + Data + Dependencies Figures + Data [Freire, 2014] Minimum: data and source code available under terms that permit inspection and execution.
  • 24. Measuring Information Gain from Reproducibility Research goal Method/Alg. Platform/Exec Env Data Parameters Input data Actors Information Gain Implementation/Code No change Change Don’t care https://linkingresearch.wordpress.com/2016/02/21/dagstuhl-seminar-report-reproducibility-of-data-oriented-experiments-in-e-scienc/ http://www.dagstuhl.de/16041
  • 25. How? Preserve by Reporting, Reproduce by Reading Archived Record Description Zoo standards, common metadata
  • 26. How? Preserve by Maintaining, Repairing, Containing Reproduce by Running, Emulating, Reconstructing Active Instrument Byte level Buildability Zoo
  • 27. provenance portability, preservation robustness, versioning access description standards common APIs licensing, identifiers standards, common metadata change variation sensitivity discrepancy handling packaging, containers FAIR RACE Reproducibility Dimensions dependencies steps
  • 28. Research Object Standards-based metadata framework for logically and physically bundling resources with context, http://researchobject.org Bigger on the inside than the outside external referencing
  • 29. Manifest Construction Aggregates link things together Annotations about things & their relationships Container Research Object Standards-based metadata framework for logically and physically bundling resources with context, http://researchobject.org Packaging content & links: Zip files, BagIt, Docker images Catalogues & Commons Platforms: FAIRDOM Manifest Description Dependencies what else is needed Versioning its evolution Checklists what should be there Provenance where it came from Identification locate things regardless where id
  • 30. Systems Biology Commons • Link data, models and SOPs • Standards • Span resources • Snapshot + DOIs • Bundle and export • Logical bundles
  • 31. Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, JWeb Semantics doi:10.1016/j.websem.2015.01.003 application/vnd.wf4ever.robundle+zip Workflow Research Objects exchange, portability and maintenance *https://2016-oslo-repeatability.readthedocs.org/en/latest/overview-and-agenda.html
  • 32. Asthma Research e-Lab Dataset building and releasing Standardised packing of Systems Biology models European Space Agency RO Library Large dataset management for life science workflows LHC ATLAS experiments Notre Dame U Rostock Encyclopedia of DNA Elements PeptideAtlas
  • 33. Words matter. Reproducibility is not a end. Its a means to an end. Beware reproducibility zealots. 50 Shades of Reproducibility. form vs function A conundrum: big co-operative data-driven science makes reproducibility desirable but also means dependency and change are to be expected. Lab analogy for computational science