SlideShare a Scribd company logo
Reproducibility,
Research Objects
and Reality
Professor Carole Goble
The University of Manchester, UK
Software Sustainability Institute, UK
ELIXIR UK,
FAIRDOMAssociation e.V.
carole.goble@manchester.ac.uk
University of Leiden,The Netherlands, 24 November 2016
Acknowledgements
• Dagstuhl Seminar 16041 , January 2016
– http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041
• ATI Symposium Reproducibility, Sustainability and Preservation , April 2016
– https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/
– https://osf.io/bcef5/files/
• CTitus Brown
• Juliana Freire
• David De Roure
• Stian Soiland-Reyes
• Barend Mons
• Tim Clark
• Daniel Garijo
• Norman Morrison
• Katy Wolstencroft
Phil Bourne
Natalie Stanford
Jacky Snoep
Stuart Owen
Marco Roos
Kristina Hettne
AlanWilliams
Sean Bechhofer
Ian Fore
Rafael Jimenez
…. And many more
Michael Crusoe
Paul Groth
Niall Beard
Context: Computational Science
http://tpeterka.github.io/maui-project/
From:The Future of ScientificWorkflows, Report of DOEWorkshop 2015,
http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd
1. Observational,
experimental
2. Theoretical
3. Simulation
4. Data intensive
Motivation: Knowledge Turning
research infrastructures
• Computational tools
• Sharing platforms
• Knowledge
Exchange
• Reproducible
research
• Software and data
practices
• Policies
[Josh Sommer, for the picture]
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility
Rampancy
NIH Rigor and Reproducibility
https://www.nih.gov/research-
training/rigor-reproducibility
Plenty of
guidelines
cos.io/top
Plenty of
principles
https://wellcomeopenresearch.org/ Nature Scientific Data
Data as a first class citizen + Data Citation
Scholarly Communications Providers
Software as a first class citizen +
Software Citation
Funders
http://www.acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research/
republic of science*
regulation of science
institution cores / libraries / public services
*Merton’s four norms of scientific behaviour (1942)
FAIR
Findable
Accessible
Interoperable
Reusable
Intelligible
Reproducible
Citable
Track & Countable
http://ec.europa.eu/research/participants/data/ref/h2020/
grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Research Infrastructure for
FAIR Management and Sharing of
Data, Operating Procedures, Model
For Systems and Synthetic Biology
Projects
Research Infrastructure for
FAIR Data for Life Sciences in
Europe
Data-Driven Science
Reproducibility, Research Objects and Reality, Leiden 2016
design
cherry picking data, random seed
reporting, non-independent bias, poor
positive and negative controls, dodgy
normalisation, arbitrary cut-offs,
premature data triage, un-validated
materials, improper statistical analysis,
poor statistical power, stop when “get to
the right answer”, software
misconfigurations misapplied black box
software
reporting
incomplete reporting of software configurations, parameters & resource
versions, missed steps, missing data, vague methods, missing software
Empirical Statistical Computational
V. Stodden, IMS Bulletin (2013)
Reproducibility and reliability of biomedical
research: improving research practice
“When I use a word," Humpty Dumpty
said in rather a scornful tone, "it means
just what I choose it to mean - neither
more nor less.”
Carroll, Through the Looking Glass
re-compute
replicate
rerun
repeat
re-examine
repurpose
recreate
reuse
restore
reconstruct review
regenerate
revise
recycle
redo
robustness
tolerance
verificationcompliancevalidation assurance
remix
Scientific publications goals:
(i) announce a result
(ii) convince readers its correct.
Papers in experimental science
should describe the results and
provide a clear enough protocol to
allow successful repetition and
extension.
Papers in computational science
should describe the results and
provide the complete software
development environment, data
and set of instructions which
generated the figures.
VirtualWitnessing*
*Leviathan and theAir-Pump: Hobbes, Boyle, and the
Experimental Life (1985) Shapin and Schaffer.
Jill Mesirov
David Donoho
Computational
Complex Assemblies
Remote Calls
“Micro” Reproducibility
“Macro” Reproducibility
Fixivity
Validate
Verify
Trust
Repeatability:
“Sameness”
Same result
1 Lab
1 experiment
Reproducibility:
“Similarity”
Similar result
> 1 Lab
> 1 experiment
why the differences?
https://2016-oslo-
repeatability.readthedocs.org/en/latest/repeatability-discussion.htm
Validate
Verify
Method Reproducibility
the provision of enough detail about
study procedures and data so the
same procedures could, in theory or in
actuality, be exactly repeated.
Result Reproducibility
(aka replicability)
obtaining the same results from the
conduct of an independent study
whose procedures are as closely
matched to the original experiment
as possible
Goodman, et al ScienceTranslational Medicine 8 (341) 2016
Validate
Verify
Productivity
Track differences
Validate
Verify
reviewers want additional work
statistician wants more runs
analysis needs to be repeated
post-doc leaves,
student arrives
new/revised datasets
updated/new versions of
algorithms/codes
sample was contaminated
better kit - longer simulations
new partners, new projects
Personal & Lab
Productivity
Public Good
Reproducibility
Computational “Datascopes”
Methods
techniques, algorithms,
spec. of the steps, models
Materials
datasets, parameters,
algorithm seeds
Instruments
codes, services, scripts,
underlying libraries,
workflows, ref datasets
Laboratory
sw and hw infrastructure,
systems software,
integrative platforms
computational environment
“Datascope” Practicalities
Methods
Materials
Instruments
Laboratory
Change Dependencies
science,
methods,
datasets
questions stay,
answers change
breakage, labs
decay, services,
techniques and
instruments
change, updated
datasets, services,
codes, hardware
software entropy
one offs,
streams,
stochastics,
sensitivities,
scale,
non-portable data
supercomputer
access
non-portable
software
licensing restrictions
unreliable resources
and third party codes
complexity
Blackboxes
blackbox
software
hidden manual
steps
blackbox
software
hidden manual
steps
Reproducibility, Research Objects and Reality, Leiden 2016
Active Instrument
Byte level preservation
Reproduce by RunningReproduce by Reading
Archived Record
Prepare to repair
ELNs
Markup Languages
Reporting Guidelines
Common Formats
Community
vocabularies
Record All
Automate All
Contain All
Expose All
Findable
Accessible
Interoperable
Reusable
provenance
portability
preservation
robustness
versioning
access description
standards
common APIs
licensing
standards,
common metadata
change
variation sensitivity
discrepancy handling
packaging, containers
FAIR RACE shades of reproducibility
dependencies
stepsids
A robust infrastructure
for biological information.
bio.tools
https://usegalaxy.org/
Workflow Description
Workflows Preservation
Workflow Portability
Workflow Interoperability
Workflow Preservation and Exchange
Experiments
Workflows &Workflow Runs
Workflow Commons
Third Party Services
Scattered resources
Workflow Preservation and Exchange
Experiments
Workflows &Workflow Runs
Workflow Commons
Third Party Services
Scattered resources
Rich descriptions
Prepare to Repair
Standards-based metadata framework for bundling resources
with context
Citable Reproducible Packaging
Metadata for bundling resources scattered and stored somewhere else
Container
Research Object in a nutshell
Packaging content & links:
Zip files, BagIt, Docker images
Catalogues & Commons Platforms:
FAIRDOM, myExperiment
Manifest
Construction
Aggregates
link things together
Annotations
about things & their
relationships
Container
Research Object in a nutshell
Manifest
Description
Dependencies
what else is
needed
Versioning
its evolution
Checklists
what should
be there
Provenance
where it
came from
Identification
locate things
regardless where
id
Packaging content & links:
Zip files, BagIt, Docker images
Catalogues & Commons Platforms:
FAIRDOM, myExperiment
Manifest
Construction
Aggregates
link things together
Annotations
about things & their
relationships
Container
Research Object Profile forWorkflows…
Manifest
Description
Identification
locate things
regardless where
Minimum information
for one content type
Common properties
among content
types
Research Object Profile forWorkflows…
Manifest
Description
Minimum information
for one content type
Common properties
among content
types
Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects,
JWeb Semantics doi:10.1016/j.websem.2015.01.003
Hettne KM, et al (2014), Structuring research methods and data with the research object model: genomics workflows as a
case study. J. Biomedical Semantics 5: 41
Workflow Research Object Bundles
exchange, portability and maintenance
BagIt
workflows packaged into
various containers for sharing
Checksum
Workflow and Workflow Management System Zoo
https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
bio.tools
A community led standard way
of expressing and running
workflows and command line
tools using containers
Ontologies for describing tools
and their inputs and outputs
Metadata framework for the
manifest versioning, file
integrity, more metadata
about the workflow
Workflow fragment containers
Findable
Accessible
Interoperable
Reusable
Data
Operations
Models
Systems and Synthetic Biology Projects
Funder: Legacy!
Partners
Project Support
Community Actions
Platforms,Tools
Web-based Portal
Public Commons
50+ projects
5 programmes
400+ people
22 independent
installations
Systems Approach…
Multiple, interrelated assets, Multiple, dispersed repositories
Literature
SOPS
STANDARDS
versioning,
tracking:
provenance,
parameters,
citation
Operations
FAIR Data and Metadata Standards that
help to improve understanding and exchange….
Nicolas Le Novère, Babraham Institute, UK.
…researchers do not always use them....
… model reuse and reproducibility tricky…
Stanford et alThe evolution of standards and data management practices in systems
biology, Molecular Systems Biology (2015) 11: 851 DOI 10.15252/msb.20156053
Systems Approach…
teams, processes, multi-partner, multi-discipline, legacy
Funders
Researchers
Publishers
What methods are been used to determine
enzyme activity?
What SOP was used for this sample?
Where is the validation data for this model?
Is there any group generating kinetic data?
Is this data available?
Track versions of my model
Whats the relationship between the data and
model?
Which data belong to
which publications?
FAIR
A Commons
fairdomhub.org
Reproducibility, Research Objects and Reality, Leiden 2016
Investigation
Study Analysis
Data
Model
SOP(Assay)
….organised in Investigation, Study, Assay/Analysis format
….registered using Just Enough Results description
….organised in Investigation, Study, Assay/Analysis format
….registered using Just Enough Results description.
Just Enough
Results Model
Common elements
….organised in Investigation, Study, Assay/Analysis format
….registered using Just Enough Results description.
Uploaded into the
FAIRDOM Store
Linked to entry
in Public Archive
Linked to entry in
Project store
... aggregating catalogue
metadata across repositories, retain context-> reproduce, reuse
Local Stores
External
Databases
Publishing services
Secure
Stores
Model
Resources
… in situ reproducible models
metadata annotation against standards
model validation, comparison and simulation
SBML Model simulation
Model comparison
Model versioning
Reproducing simulations
[Jacky Snoep, Dagmar Waltemate, Martin Peters, Martin Scharm]
…. Nested Packages
context and credit
Research Objects
• Link
• Nest
• Span
• Bundle
• Snapshot
Systematic, Standards-
based metadata
framework for logically
and physically bundling
resources with context
• Exchange
• Reproduce
• Release packages
Reproducible Exchange and Publishing
and better credit
Author List: Joe Bloggs; Jane Doe
Title: My Investigation
Date: September 2016
DOI: https://doi.org/10.15490/seek##
information travels with the data and models
How do we do? Pretty well.
Reproducibility window. But that’s ok!
• Can’t contain everything
– Pesky Internet in a Box
• Can’t automate everything
– Pesky people
• Can’t fix everything
– Pesky science
Asthma Research
e-Laboratory
Release builds of
pharmacological
knowledge
warehouse
Exchanging
large datasets
Samiul Hasan, GSK
Biocuration need in Pharma: Drivers from aTranslational Bioinformatics Perspective,
Poster S16
1st EASYMConference, Berlin 2016
Reality
Preparation pain. Goldilocks paradox.
[Norman Morrison]
replication hostility no funding, time, recognition, place to publish
resource intensive access to the complete environment
“Data Parasites”
“Data Flirters”
“Share Drift”
Family
Friends
Potential Friends
Acquaintances
Strangers
Rivals
Reciprocity
Using FAIRDOM my own
lab colleagues saw what I
was doing and called to
collaborate!
Jurgen Hannstra
Vrije Universiteit Amsterdam, Netherlands
Trust …
Reproducibility, Research Objects and Reality, Leiden 2016
Reproducibility, Research Objects and Reality, Leiden 2016
Half of researchers make research data available
so they can be used by another.
Most not experienced any direct benefits
nor experienced many bad effects.
Caveat:
shared but usable?
fake sharing
funder requirements
fear data will be
misused or
misinterpreted
journal requirements
good research practice
facilitate collaborations
enable validation and
replication
higher citation rates
time and effort
new collaborations
extra funding for cost of data prep
enhance their academic reputation
feedback on how other researchers were using
their data
taken into account in funding
taken into account in career
jeopardise future publications
its not ready to share
scrutiny scruples
answering questions
I won’t get credited
Metadata in by side effect
Tooling for annotations and checklist templates for different types of assay data.
Embed ontologies into
Excel templates
Excel spreadsheets enriched
with ontology annotations
Upload, extract metadata and register
http://www.rightfield.org.uk
Spreadsheet Ramps!!
Sharing by side effect …. libertarian paternalism
[Kristian Garza]
Finding and Citing by side effect
• Schema.org
• Structured
markup in web
pages
• Supported by
Content
Management
Systems
• Harvested by
search engines
• Builds snippets
and sidebars
Bioschemas.org
Data
repository
Data
repository
Training
Resource
Bioschemas Bioschemas Bioschemas
Search engine Bio Registries
Biosharing
OLS, TeSS
bio.tools
UKCRC Tissue
Directory
bioCADDIE DATAMED
PDBe UniProt
Interpro Molgenis Pfam
Gene3DBiosamples
Biobank websites
BRENDA HPA
TransPlantEGA Beacons
EBI-Search
Google
Finding and Citing by side effect
Bioschemas.org
Big co-operative data-driven
science makes reproducibility
desirable but also means
dependency and change are to be
expected
Words matter.
50 Shades of Reproducibility.
form vs function
Reproducibility is not a end.
Beware zealots.
Amplify Side effects
Think Research Objects!

More Related Content

PPTX
Mtsr2015 goble-keynote
PDF
Research Shared: researchobject.org
PPTX
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
PPTX
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
Advances in Scientific Workflow Environments
PPTX
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
PPTX
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
Mtsr2015 goble-keynote
Research Shared: researchobject.org
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Advances in Scientific Workflow Environments
FAIR Data, Operations and Model management for Systems Biology and Systems Me...
What is Reproducibility? The R* brouhaha (and how Research Objects can help)

What's hot (20)

PPTX
Research Objects, SEEK and FAIRDOM
PPTX
Being Reproducible: SSBSS Summer School 2017
PPTX
FAIRy Stories
PPTX
The Rhetoric of Research Objects
PPTX
FAIRer Research
PPTX
Research Objects: more than the sum of the parts
PPTX
The Research Object Initiative: Frameworks and Use Cases
PPTX
Being FAIR: Enabling Reproducible Data Science
PPTX
Crediting informatics and data folks in life science teams
PPTX
ROHub
PPTX
Reproducible Research: how could Research Objects help
PPTX
Aspects of Reproducibility in Earth Science
PDF
Reproducibility of model-based results: standards, infrastructure, and recogn...
PDF
Capturing the context: one small(ish step for modellers, one giant leap for m...
PPTX
FAIR Workflows and Research Objects get a Workout
PDF
Improving the Management of Computational Models -- Invited talk at the EBI
PPTX
Better Software, Better Research
PPT
Publishing data and code openly
PPTX
Reflections on a (slightly unusual) multi-disciplinary academic career
PPT
Ngsp
Research Objects, SEEK and FAIRDOM
Being Reproducible: SSBSS Summer School 2017
FAIRy Stories
The Rhetoric of Research Objects
FAIRer Research
Research Objects: more than the sum of the parts
The Research Object Initiative: Frameworks and Use Cases
Being FAIR: Enabling Reproducible Data Science
Crediting informatics and data folks in life science teams
ROHub
Reproducible Research: how could Research Objects help
Aspects of Reproducibility in Earth Science
Reproducibility of model-based results: standards, infrastructure, and recogn...
Capturing the context: one small(ish step for modellers, one giant leap for m...
FAIR Workflows and Research Objects get a Workout
Improving the Management of Computational Models -- Invited talk at the EBI
Better Software, Better Research
Publishing data and code openly
Reflections on a (slightly unusual) multi-disciplinary academic career
Ngsp
Ad

Similar to Reproducibility, Research Objects and Reality, Leiden 2016 (20)

PPTX
Research Objects for FAIRer Science
PPT
The beauty of workflows and models
PDF
A Clean Slate?
PDF
Aussois bda-mdd-2018
PPTX
What is Reproducibility? The R* brouhaha and how Research Objects can help
PPT
The Future of Research (Science and Technology)
PPTX
Towards Computational Research Objects
PPTX
Research Objects for improved sharing and reproducibility
PDF
RDA Scholarly Infrastructure 2015
PPTX
Social Machines of Science and Scholarship
PPT
Results may vary: Collaborations Workshop, Oxford 2014
PDF
L&P Eric Celeste - SHARE
PPTX
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
PPTX
myExperiment and the Rise of Social Machines
ODP
2011 03-provenance-workshop-edingurgh
PPT
myExperiment @ Nettab
PDF
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
PPT
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
Introduction to FAIRDOM
Research Objects for FAIRer Science
The beauty of workflows and models
A Clean Slate?
Aussois bda-mdd-2018
What is Reproducibility? The R* brouhaha and how Research Objects can help
The Future of Research (Science and Technology)
Towards Computational Research Objects
Research Objects for improved sharing and reproducibility
RDA Scholarly Infrastructure 2015
Social Machines of Science and Scholarship
Results may vary: Collaborations Workshop, Oxford 2014
L&P Eric Celeste - SHARE
OSFair2017 Workshop | How FAIR friendly is the FAIRDOM Hub? Exposing metadata...
myExperiment and the Rise of Social Machines
2011 03-provenance-workshop-edingurgh
myExperiment @ Nettab
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...
Reproducibility (and the R*) of Science: motivations, challenges and trends
Introduction to FAIRDOM
Ad

More from Carole Goble (20)

PPTX
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
PPTX
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
PPTX
RO-Crate: packaging metadata love notes into FAIR Digital Objects
PPTX
Research Software Sustainability takes a Village
PPTX
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
PPTX
FAIR Computational Workflows
PPTX
Open Research: Manchester leading and learning
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PPTX
FAIR Computational Workflows
PPTX
FAIR Computational Workflows
PPTX
EOSC-Life Workflow Collaboratory
PPTX
FAIR Computational Workflows
PPTX
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
PPTX
FAIR Computational Workflows
PPTX
FAIRy stories: the FAIR Data principles in theory and in practice
PPTX
RO-Crate: A framework for packaging research products into FAIR Research Objects
PPTX
The swings and roundabouts of a decade of fun and games with Research Objects
PPTX
How are we Faring with FAIR? (and what FAIR is not)
PPTX
FAIR History and the Future
PPTX
ELIXIR UK Node presentation to the ELIXIR Board
The ELIXIR FAIR Knowledge Ecosystem for practical know-how: RDMkit and FAIRCo...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
RO-Crate: packaging metadata love notes into FAIR Digital Objects
Research Software Sustainability takes a Village
Title: Love, Money, Fame, Nudge: Enabling Data-intensive BioScience through D...
FAIR Computational Workflows
Open Research: Manchester leading and learning
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
FAIR Computational Workflows
FAIR Computational Workflows
EOSC-Life Workflow Collaboratory
FAIR Computational Workflows
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Computational Workflows
FAIRy stories: the FAIR Data principles in theory and in practice
RO-Crate: A framework for packaging research products into FAIR Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
How are we Faring with FAIR? (and what FAIR is not)
FAIR History and the Future
ELIXIR UK Node presentation to the ELIXIR Board

Recently uploaded (20)

PDF
An interstellar mission to test astrophysical black holes
PPTX
Cell Membrane: Structure, Composition & Functions
PDF
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
2. Earth - The Living Planet earth and life
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
Microbiology with diagram medical studies .pptx
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
diccionario toefl examen de ingles para principiante
PPTX
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
The scientific heritage No 166 (166) (2025)
An interstellar mission to test astrophysical black holes
Cell Membrane: Structure, Composition & Functions
MIRIDeepImagingSurvey(MIDIS)oftheHubbleUltraDeepField
AlphaEarth Foundations and the Satellite Embedding dataset
2. Earth - The Living Planet earth and life
Taita Taveta Laboratory Technician Workshop Presentation.pptx
protein biochemistry.ppt for university classes
Microbiology with diagram medical studies .pptx
Classification Systems_TAXONOMY_SCIENCE8.pptx
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
neck nodes and dissection types and lymph nodes levels
diccionario toefl examen de ingles para principiante
SCIENCE10 Q1 5 WK8 Evidence Supporting Plate Movement.pptx
microscope-Lecturecjchchchchcuvuvhc.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Viruses (History, structure and composition, classification, Bacteriophage Re...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
7. General Toxicologyfor clinical phrmacy.pptx
The scientific heritage No 166 (166) (2025)

Reproducibility, Research Objects and Reality, Leiden 2016

  • 1. Reproducibility, Research Objects and Reality Professor Carole Goble The University of Manchester, UK Software Sustainability Institute, UK ELIXIR UK, FAIRDOMAssociation e.V. carole.goble@manchester.ac.uk University of Leiden,The Netherlands, 24 November 2016
  • 2. Acknowledgements • Dagstuhl Seminar 16041 , January 2016 – http://www.dagstuhl.de/en/program/calendar/semhp/?semnr=16041 • ATI Symposium Reproducibility, Sustainability and Preservation , April 2016 – https://turing.ac.uk/events/reproducibility-sustainability-and-preservation/ – https://osf.io/bcef5/files/ • CTitus Brown • Juliana Freire • David De Roure • Stian Soiland-Reyes • Barend Mons • Tim Clark • Daniel Garijo • Norman Morrison • Katy Wolstencroft Phil Bourne Natalie Stanford Jacky Snoep Stuart Owen Marco Roos Kristina Hettne AlanWilliams Sean Bechhofer Ian Fore Rafael Jimenez …. And many more Michael Crusoe Paul Groth Niall Beard
  • 3. Context: Computational Science http://tpeterka.github.io/maui-project/ From:The Future of ScientificWorkflows, Report of DOEWorkshop 2015, http://science.energy.gov/~/media/ascr/pdf/programdocuments/docs/workflows_final_report.pd 1. Observational, experimental 2. Theoretical 3. Simulation 4. Data intensive
  • 4. Motivation: Knowledge Turning research infrastructures • Computational tools • Sharing platforms • Knowledge Exchange • Reproducible research • Software and data practices • Policies [Josh Sommer, for the picture]
  • 7. NIH Rigor and Reproducibility https://www.nih.gov/research- training/rigor-reproducibility Plenty of guidelines cos.io/top
  • 9. https://wellcomeopenresearch.org/ Nature Scientific Data Data as a first class citizen + Data Citation Scholarly Communications Providers
  • 10. Software as a first class citizen + Software Citation
  • 12. republic of science* regulation of science institution cores / libraries / public services *Merton’s four norms of scientific behaviour (1942)
  • 14. Research Infrastructure for FAIR Management and Sharing of Data, Operating Procedures, Model For Systems and Synthetic Biology Projects Research Infrastructure for FAIR Data for Life Sciences in Europe Data-Driven Science
  • 16. design cherry picking data, random seed reporting, non-independent bias, poor positive and negative controls, dodgy normalisation, arbitrary cut-offs, premature data triage, un-validated materials, improper statistical analysis, poor statistical power, stop when “get to the right answer”, software misconfigurations misapplied black box software reporting incomplete reporting of software configurations, parameters & resource versions, missed steps, missing data, vague methods, missing software Empirical Statistical Computational V. Stodden, IMS Bulletin (2013) Reproducibility and reliability of biomedical research: improving research practice
  • 17. “When I use a word," Humpty Dumpty said in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.” Carroll, Through the Looking Glass re-compute replicate rerun repeat re-examine repurpose recreate reuse restore reconstruct review regenerate revise recycle redo robustness tolerance verificationcompliancevalidation assurance remix
  • 18. Scientific publications goals: (i) announce a result (ii) convince readers its correct. Papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension. Papers in computational science should describe the results and provide the complete software development environment, data and set of instructions which generated the figures. VirtualWitnessing* *Leviathan and theAir-Pump: Hobbes, Boyle, and the Experimental Life (1985) Shapin and Schaffer. Jill Mesirov David Donoho
  • 21. Repeatability: “Sameness” Same result 1 Lab 1 experiment Reproducibility: “Similarity” Similar result > 1 Lab > 1 experiment why the differences? https://2016-oslo- repeatability.readthedocs.org/en/latest/repeatability-discussion.htm Validate Verify
  • 22. Method Reproducibility the provision of enough detail about study procedures and data so the same procedures could, in theory or in actuality, be exactly repeated. Result Reproducibility (aka replicability) obtaining the same results from the conduct of an independent study whose procedures are as closely matched to the original experiment as possible Goodman, et al ScienceTranslational Medicine 8 (341) 2016 Validate Verify
  • 24. reviewers want additional work statistician wants more runs analysis needs to be repeated post-doc leaves, student arrives new/revised datasets updated/new versions of algorithms/codes sample was contaminated better kit - longer simulations new partners, new projects Personal & Lab Productivity Public Good Reproducibility
  • 25. Computational “Datascopes” Methods techniques, algorithms, spec. of the steps, models Materials datasets, parameters, algorithm seeds Instruments codes, services, scripts, underlying libraries, workflows, ref datasets Laboratory sw and hw infrastructure, systems software, integrative platforms computational environment
  • 26. “Datascope” Practicalities Methods Materials Instruments Laboratory Change Dependencies science, methods, datasets questions stay, answers change breakage, labs decay, services, techniques and instruments change, updated datasets, services, codes, hardware software entropy one offs, streams, stochastics, sensitivities, scale, non-portable data supercomputer access non-portable software licensing restrictions unreliable resources and third party codes complexity Blackboxes blackbox software hidden manual steps blackbox software hidden manual steps
  • 28. Active Instrument Byte level preservation Reproduce by RunningReproduce by Reading Archived Record Prepare to repair ELNs Markup Languages Reporting Guidelines Common Formats Community vocabularies
  • 29. Record All Automate All Contain All Expose All Findable Accessible Interoperable Reusable
  • 30. provenance portability preservation robustness versioning access description standards common APIs licensing standards, common metadata change variation sensitivity discrepancy handling packaging, containers FAIR RACE shades of reproducibility dependencies stepsids
  • 31. A robust infrastructure for biological information. bio.tools
  • 33. Workflow Preservation and Exchange Experiments Workflows &Workflow Runs Workflow Commons Third Party Services Scattered resources
  • 34. Workflow Preservation and Exchange Experiments Workflows &Workflow Runs Workflow Commons Third Party Services Scattered resources Rich descriptions Prepare to Repair
  • 35. Standards-based metadata framework for bundling resources with context Citable Reproducible Packaging Metadata for bundling resources scattered and stored somewhere else
  • 36. Container Research Object in a nutshell Packaging content & links: Zip files, BagIt, Docker images Catalogues & Commons Platforms: FAIRDOM, myExperiment
  • 37. Manifest Construction Aggregates link things together Annotations about things & their relationships Container Research Object in a nutshell Manifest Description Dependencies what else is needed Versioning its evolution Checklists what should be there Provenance where it came from Identification locate things regardless where id Packaging content & links: Zip files, BagIt, Docker images Catalogues & Commons Platforms: FAIRDOM, myExperiment
  • 38. Manifest Construction Aggregates link things together Annotations about things & their relationships Container Research Object Profile forWorkflows… Manifest Description Identification locate things regardless where Minimum information for one content type Common properties among content types
  • 39. Research Object Profile forWorkflows… Manifest Description Minimum information for one content type Common properties among content types
  • 40. Belhajjame et al (2015) Using a suite of ontologies for preserving workflow-centric research objects, JWeb Semantics doi:10.1016/j.websem.2015.01.003 Hettne KM, et al (2014), Structuring research methods and data with the research object model: genomics workflows as a case study. J. Biomedical Semantics 5: 41 Workflow Research Object Bundles exchange, portability and maintenance BagIt workflows packaged into various containers for sharing Checksum
  • 41. Workflow and Workflow Management System Zoo https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems
  • 42. bio.tools A community led standard way of expressing and running workflows and command line tools using containers Ontologies for describing tools and their inputs and outputs Metadata framework for the manifest versioning, file integrity, more metadata about the workflow Workflow fragment containers
  • 45. Project Support Community Actions Platforms,Tools Web-based Portal Public Commons 50+ projects 5 programmes 400+ people 22 independent installations
  • 46. Systems Approach… Multiple, interrelated assets, Multiple, dispersed repositories Literature SOPS STANDARDS versioning, tracking: provenance, parameters, citation Operations
  • 47. FAIR Data and Metadata Standards that help to improve understanding and exchange…. Nicolas Le Novère, Babraham Institute, UK. …researchers do not always use them....
  • 48. … model reuse and reproducibility tricky… Stanford et alThe evolution of standards and data management practices in systems biology, Molecular Systems Biology (2015) 11: 851 DOI 10.15252/msb.20156053
  • 49. Systems Approach… teams, processes, multi-partner, multi-discipline, legacy Funders Researchers Publishers
  • 50. What methods are been used to determine enzyme activity? What SOP was used for this sample? Where is the validation data for this model? Is there any group generating kinetic data? Is this data available? Track versions of my model Whats the relationship between the data and model? Which data belong to which publications? FAIR
  • 53. Investigation Study Analysis Data Model SOP(Assay) ….organised in Investigation, Study, Assay/Analysis format ….registered using Just Enough Results description
  • 54. ….organised in Investigation, Study, Assay/Analysis format ….registered using Just Enough Results description. Just Enough Results Model Common elements
  • 55. ….organised in Investigation, Study, Assay/Analysis format ….registered using Just Enough Results description. Uploaded into the FAIRDOM Store Linked to entry in Public Archive Linked to entry in Project store
  • 56. ... aggregating catalogue metadata across repositories, retain context-> reproduce, reuse Local Stores External Databases Publishing services Secure Stores Model Resources
  • 57. … in situ reproducible models metadata annotation against standards model validation, comparison and simulation SBML Model simulation Model comparison Model versioning Reproducing simulations [Jacky Snoep, Dagmar Waltemate, Martin Peters, Martin Scharm]
  • 59. Research Objects • Link • Nest • Span • Bundle • Snapshot Systematic, Standards- based metadata framework for logically and physically bundling resources with context • Exchange • Reproduce • Release packages
  • 60. Reproducible Exchange and Publishing and better credit Author List: Joe Bloggs; Jane Doe Title: My Investigation Date: September 2016 DOI: https://doi.org/10.15490/seek## information travels with the data and models
  • 61. How do we do? Pretty well. Reproducibility window. But that’s ok! • Can’t contain everything – Pesky Internet in a Box • Can’t automate everything – Pesky people • Can’t fix everything – Pesky science
  • 62. Asthma Research e-Laboratory Release builds of pharmacological knowledge warehouse Exchanging large datasets
  • 63. Samiul Hasan, GSK Biocuration need in Pharma: Drivers from aTranslational Bioinformatics Perspective, Poster S16 1st EASYMConference, Berlin 2016 Reality
  • 64. Preparation pain. Goldilocks paradox. [Norman Morrison] replication hostility no funding, time, recognition, place to publish resource intensive access to the complete environment
  • 65. “Data Parasites” “Data Flirters” “Share Drift” Family Friends Potential Friends Acquaintances Strangers Rivals Reciprocity
  • 66. Using FAIRDOM my own lab colleagues saw what I was doing and called to collaborate! Jurgen Hannstra Vrije Universiteit Amsterdam, Netherlands Trust …
  • 69. Half of researchers make research data available so they can be used by another. Most not experienced any direct benefits nor experienced many bad effects. Caveat: shared but usable? fake sharing funder requirements fear data will be misused or misinterpreted journal requirements good research practice facilitate collaborations enable validation and replication higher citation rates time and effort new collaborations extra funding for cost of data prep enhance their academic reputation feedback on how other researchers were using their data taken into account in funding taken into account in career jeopardise future publications its not ready to share scrutiny scruples answering questions I won’t get credited
  • 70. Metadata in by side effect Tooling for annotations and checklist templates for different types of assay data. Embed ontologies into Excel templates Excel spreadsheets enriched with ontology annotations Upload, extract metadata and register http://www.rightfield.org.uk Spreadsheet Ramps!!
  • 71. Sharing by side effect …. libertarian paternalism [Kristian Garza]
  • 72. Finding and Citing by side effect • Schema.org • Structured markup in web pages • Supported by Content Management Systems • Harvested by search engines • Builds snippets and sidebars Bioschemas.org
  • 73. Data repository Data repository Training Resource Bioschemas Bioschemas Bioschemas Search engine Bio Registries Biosharing OLS, TeSS bio.tools UKCRC Tissue Directory bioCADDIE DATAMED PDBe UniProt Interpro Molgenis Pfam Gene3DBiosamples Biobank websites BRENDA HPA TransPlantEGA Beacons EBI-Search Google Finding and Citing by side effect Bioschemas.org
  • 74. Big co-operative data-driven science makes reproducibility desirable but also means dependency and change are to be expected Words matter. 50 Shades of Reproducibility. form vs function Reproducibility is not a end. Beware zealots. Amplify Side effects Think Research Objects!