SlideShare a Scribd company logo
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
http://www.oeg-upm.net/index.php/en/researchareas/3-
semanticscience/index.html
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
ocorcho@fi.upm.es
@ocorcho
22/10/2017
S4BioDiv2017, Vienna
Towards Reproducible Science
Introduction
2
HYPOTHESIS CONVINCE
AUDIENCE
REPEATABLE
SCIENTIFIC EXPERIMENTS
Towards Reproducible Science
Introduction
3
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
Alison’s
biodiversity
scientists
Towards Reproducible Science
Introduction
4
SCIENTIFIC EXPERIMENTS
IN VIVO/VITRO IN SILICO
REPEATABILITY
Alison’s
biodiversity
scientists
Towards Reproducible Science 5
 Before continuing….
What does reproducibility
mean for you?
And for your colleagues?
And for the colleagues from
other disciplines?
Towards Reproducible Science
The R* brouhaha
6
Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
7
Towards Reproducible Science
My own take on terminology
PRESERVATION
CONSERVATION
REPLICABILITY
REPRODUCIBILITY
8
Towards Reproducible Science
Experiment components
9
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
10
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
This has attracted most
of the attention so far
Towards Reproducible Science
Block 1. Experimental Protocols
11
Olga Giraldo
Alexander Garcia
Explore alternative ways for documenting and
retrieving information from experimental protocols
Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro
A, Corcho O - ICBO, 2015
Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo
O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear
SMART protocols: semantic representation for experimental protocols. Giraldo O,
García-Castro A, Corcho O – Linked Science 2014
Towards Reproducible Science
What is an experimental protocol
 Experimental protocols
are like cooking recipes
 They have ingredients:
reagents and sample
 They have appliances:
equipment,
 They have a list of instructions,
The protocols should have
complete information that
allows anybody to recreate an
experiment.
 They have a total time
 They have critical steps…
Towards Reproducible Science
Some of the issues we aim at addressing
• Incubate the
centrifuge tubes in a
water bath.
• Incubate the samples
for 5 min with gentle
shaking.
• Rinse DNA briefly in
1-2 ml of wash.
• Incubate at -20C
overnight.
 some protocols present insufficient
granularity,
 the instructions can be imprecise or
ambiguous due to the use of natural
language.
 The protocols lack structure
Towards Reproducible Science
Bio-ontologies
OBI, EXPO, EXACT, BAO, IAO, ERO…
Data repository
for making data
available
few efforts focus on
representing and
standardizing
experimental protocols.
For reproducibility
purposes, if the data
must be available, so
does the experimental
protocol detailing the
methodology followed
to derive the data.
Resources for
reporting guidelines or
Minimum Information
standards
Ingredients for Improving Reproducibility
Towards Reproducible Science
Main research question
How to formalize the information from
laboratory protocols as a knowledge base?
Towards Reproducible Science
Our approach
• Ontology model representing lab protocols
• Gazetteer-based method: use existing lists of named
entities
 Lists of proper nouns, which refer to real-life entities
• Rule-based approaches:
write manual extraction
rules
• Development of a Gold
Standard of protocols
annotated manually
Towards Reproducible Science
SMART Protocols ontology
17
http://vocab.linkeddata.es/SMARTProtocols/
https://smartprotocols.github.io/
Towards Reproducible Science
The SIRO model
Sample/Specimen
(whole organism, anatomical
part, bodily fluids, etc.)
Instruments
(equipment, devices,
consumables, software)
Reagents
(chemical compounds,
mixtures)
Objective
(purpose)
The SIRO model
supports search,
retrieval and
classification of
experimental protocols
Towards Reproducible Science
Design of semantic Gazetteer and JAPE rules
Design of semantic Gazetteers
• Facilitate the annotation of instances
related to:
 Experimental actions
 Instruments
 Samples/ organisms
 Reagents
Design of grammar
rules
• Facilitate the
annotation of
instructions
Towards Reproducible Science
Development of a Gold Standard
100 protocols published in
several repositories
Annotators - experts in
life sciences
http://smart-
protocols.labs.linkingdata.io/dist/d
ev/#/login
The SMART Protocols
Annotation Tool
Guidelines about What
and How annotate
Materials:
• BioTechniques,
• CSH-Protocols,
• Current protocols,
• Genet and Mol. Res,
• Journal of Biolog. Methods,
• Jove,
• MethodsX,
• Nature protocols exchange,
• Nature protocols
• Curso BIOS 2016, Colombia
• Universidad del Valle,
Colombia
• Japan (Database Center for
Life Science (DBCLS),
Robotic Biology Institute
(RBI), Spiber, Yachie-Lab,
University of Tokyo).
• Universidad Santiago de
Cali, Colombia
Towards Reproducible Science
Preliminary results
Entities sample instrument reagent objective
Sample Neural cell 3 0 0 0
neural stem cells (NSCs) 3 0 0 0
Instrument Cell culture centrifuge 0 3 0 0
cell culture incubator 0 3 0 0
Microscope 0 3 0 0
Millicell culture plate inserts 8-?m pore size 0 3 0 0
reagent B27 supplement 0 0 3 0
DMEM/F12 0 0 3 0
FGF2 neutralizing antibody 0 0 3 0
glucose 0 0 3 0
objective Here we describe two migration assays, a matrigel migration assay
and a Boyden chamber migration assay, which allow the in
vitro assessment of neural migration under defined conditions
(Ladewig, Koch and Brüstle, 2014).
0 0 0 3
entities sample instrument reagent
Reagent - Sample/Organism Ac-omega viral DNA 1 2
baculoviral 1 2
DNA insert 2 1
I-Sce I meganuclease 1 2
Sample/Organism Insect cells 3
Instrument spinner 3
Centrifuge 3
Flask 3
Reagent IPL-41 powdered 3
Liposome formulation 3
Phenol:chloroform 3
Fleiss Kappa for 3
raters = 1.0
Fleiss Kappa for 3
raters = 0.755
Towards Reproducible Science
Our ongoing work
22
 So far, this is ok for handling protocols that have
been already reported in papers
Can we actually change the way in which
these protocols are produced?
Towards Reproducible Science
Platform for publishing semantic protocols
Features:
 Open semantic publishing platform
o The protocols are born semantic
 Self describing documents
o Meaningful entities
o Machine procesable workflows
 Documents will reference existing URIs
o Samples/organisms
o Reagents/chemical compounds
o Instruments
SMART Protocols Ontology /
Gazetteers / Grammar rules
UniProt
NCBI taxonomy
PubChem
Vendors
Towards Reproducible Science
Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols
The platform
Towards Reproducible Science
25
Capturing relevant elements in the document
Towards Reproducible Science
Organisms come from the UniProt Taxon API
26
After selecting
an organism,
the
correspondent
ID is
automatically
recorded
Towards Reproducible Science
Reagents come from the PubChem API
Towards Reproducible Science
Machine processable
workflows
Step
Step
Step
Step
Step
Towards Reproducible Science
Final edited protocol, also available as bioschemas
Towards Reproducible Science
Block 2. Computational Environments
30
Idafen Santana
Is it possible to describe the main properties of the
Execution Environment of a Computational Scientific
Experiment and, based on this description, derive a
reproduction process for generating an equivalent
environment using virtualization techniques?
Conservation of Computational Scientific Execution Environments for Workflow-
based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016.
http://oa.upm.es/39520/
Towards Reproducible Science
Experiment components
31
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Experiment components
32
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
33
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
34
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
Experiment components
35
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INSILICO
Towards Reproducible Science
bundles and relates digital resources of a scientific experiment
or investigation using standard mechanisms, “tool middleware”
http://www.w3.org/community/rosc/
http://www.researchobject.org/
Towards Reproducible Science
Experiment components
38
DATA SCIENTIFIC PROCEDURE EQUIPMENT
INVIVO/VITROINSILICO
Towards Reproducible Science
Open Research Problems
39
Towards Reproducible Science
Open Research Problems
40
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
Towards Reproducible Science
Open Research Problems
41
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
 Execution Environments are poorly described.
Towards Reproducible Science
Open Research Problems
42
 Computational Infrastructures are usually a predefined
element of a Computational Scientific Workflow.
 Execution Environments are poorly described.
 Current reproducibility approaches for computational
experiments consider mostly data and procedure.
Towards Reproducible Science
Representation
43
CLOUD
 Describing execution environments
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT
EXECUTION
ENVIRONMENT
Towards Reproducible Science
Representation
 WICUS ontology network
o Workflow Infrastructure Conservation Using Semantics
o http://purl.org/net/wicus
o 5 ontologies
• WICUS Workflow Execution Requirements ontology
• WICUS Software Stack ontology
• WICUS Hardware Specs ontology
• WICUS Scientific Virtual Appliance ontology
• WICUS Ontology: links the previous ontologies
44
Towards Reproducible Science
WICUS ontology network
 WICUS Workflow Execution Requirements ontology
o http://purl.org/net/wicus-reqs
45
Towards Reproducible Science
WICUS ontology network
 WICUS Software Stack ontology
o http://purl.org/net/wicus-stack
46
Towards Reproducible Science
WICUS ontology network
 WICUS Scientific Virtual Appliance ontology
o http://purl.org/net/wicus-sva
47
Towards Reproducible Science
WICUS ontology network
 WICUS Hardware Specs ontology
o http://purl.org/net/wicus-hwspecs
48
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
49
Towards Reproducible Science
WICUS ontology network
 WICUS ontology network
o http://purl.org/net/wicus
50
Towards Reproducible Science
WICUS system
 Overview, inputs and outputs
51
Towards Reproducible Science
Evaluation
 Workflows reproduced
o 3 scientific domains
o 3 workflow management systems
o 6 different workflows
52
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
(2003) (2014)(2014) (2015) (2011)(2011)
Towards Reproducible Science
Evaluation
53
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
CLOU
D
EQUIVALENT EXECUTION
ENVIRONMENTSEMANTIC
ANNOTATIONS
COMPARE
Towards Reproducible Science
Evaluation
54
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Evaluation
55
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Non-deterministic
• Standard and error output
• Generated files equivalent
Towards Reproducible Science
Evaluation
56
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Same results
• Results from Int. Extinction
may vary
Towards Reproducible Science
Evaluation
57
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
• Genomic data
• Exact match
Towards Reproducible Science
Evaluation
58
Domain Seismic Astronomy Bio
WMS dispel4py Pegasus Makeflow
Name xcorr
Internal
Extinction
Montage Epigenomics SoyKB BLAST
Results
CLOU
D
FORMER
EQUIPMENT
ANNOTATE REPRODUCE
SEMANTIC
ANNOTATIONS
EQUIVALENT EXECUTION
ENVIRONMENT
COMPARE
Towards Reproducible Science
Summarizing
 Two building blocks towards reproducibility of
scientific experiments
o In vivo/vitro
• Focus on providing structured descriptions of methods
(laboratory protocols)
• Our tools: ontologies, gazeteers, NLP tools and
automatic and manual annotation tools
• Challenge: make protocols be more structured (and
semantic) from the beginning
o In silico
• Focus on the equipment (computational infrastructure)
for workflow-based experiments
• Ontologies, automatic and manual annotation tools, and
an execution environment
• Challenge: keep track of all types of appliances, and
make scientists work on providing annotations
 Is this enough?
59
Towards Reproducible Science
Summarizing
 Is this enough?
Clearly not, but a step forward
towards ensuring reproducibility
(with a focus on methods)
60
Oscar Corcho
(with contributions from Olga Giraldo, Alexander García,
and Idafen Santana)
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
Towards Reproducible Science: a
few building blocks from my
personal experience
ocorcho@fi.upm.es
@ocorcho
22/10/2017
S4BioDiv2017, Vienna
Towards Reproducible Science
Light pollution (www.stars4all.eu)

More Related Content

PPTX
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
PDF
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
PPTX
2014 villefranche
PPTX
2014 naples
PDF
Introduction to 16S Microbiome Analysis
PPTX
Munoz torres web-apollo-workshop_exeter-2014_ss
PPTX
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
PDF
Introduction to 16S rRNA gene multivariate analysis
Microbiome studies using 16S ribosomal DNA PCR: some cautionary tales.
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
2014 villefranche
2014 naples
Introduction to 16S Microbiome Analysis
Munoz torres web-apollo-workshop_exeter-2014_ss
Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology appl...
Introduction to 16S rRNA gene multivariate analysis

What's hot (20)

PPTX
Computational Resources In Infectious Disease
PDF
Flash introduction to Qiime2 -- 16S Amplicon analysis
PDF
Introduction to 16S Analysis with NGS - BMR Genomics
PDF
2015_CV_J_SHELTON_linked
PPTX
The benefits of environment specific curation of the public databases for tax...
PDF
T-bioinfo overview
PDF
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
PPTX
Eccmid meet the expert 2015
DOCX
Jordan.Ramsby.resume
PDF
16S rRNA Analysis using Mothur Pipeline
PPT
Presentation cybernetics immunology-ver1.02 (for-criticism)
PDF
NGS and the molecular basis of disease: a practical view
PDF
Web Apollo Workshop University of Exeter
PDF
2013 biodesign EPFL project summary
PDF
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
PDF
Initial steps towards a production platform for DNA sequence analysis on the ...
PPT
Integrating phylogenetic inference and metadata visualization for NGS data
PDF
BM405 Lecture Slides 21/11/2014 University of Strathclyde
PDF
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
PDF
Legionella Laboratory Testing | Biosan Laboratories
Computational Resources In Infectious Disease
Flash introduction to Qiime2 -- 16S Amplicon analysis
Introduction to 16S Analysis with NGS - BMR Genomics
2015_CV_J_SHELTON_linked
The benefits of environment specific curation of the public databases for tax...
T-bioinfo overview
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Eccmid meet the expert 2015
Jordan.Ramsby.resume
16S rRNA Analysis using Mothur Pipeline
Presentation cybernetics immunology-ver1.02 (for-criticism)
NGS and the molecular basis of disease: a practical view
Web Apollo Workshop University of Exeter
2013 biodesign EPFL project summary
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Initial steps towards a production platform for DNA sequence analysis on the ...
Integrating phylogenetic inference and metadata visualization for NGS data
BM405 Lecture Slides 21/11/2014 University of Strathclyde
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Legionella Laboratory Testing | Biosan Laboratories
Ad

Similar to Towards Reproducible Science: a few building blocks from my personal experience (20)

PDF
Using semantics and NLP in experimental protocols
PPTX
The role of annotation in reproducibility (Empirical 2014)
PPTX
Web Apollo: Lessons learned from community-based biocuration efforts.
PDF
14A81A05A3
PDF
China Medical University Student ePaper2
PDF
Advanced Bioinformatics for Genomics and BioData Driven Research
PPT
The beauty of workflows and models
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
PPT
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
PPT
JulieKlein_Bosc2012
PPTX
Collaboratively Creating the Knowledge Graph of Life
PPTX
Reproducibility Using Semantics: An Overview
PPT
Introduction to Ontologies for Environmental Biology
PDF
How Bio.Kitchen
PDF
Biochips seminar report
PPTX
Biological database by kk sahu
PDF
Metadata-based tools at the ENCODE Portal
PPTX
Modern Biological Tools and Techniques
PPT
The seven-deadly-sins-of-bioinformatics3960
Using semantics and NLP in experimental protocols
The role of annotation in reproducibility (Empirical 2014)
Web Apollo: Lessons learned from community-based biocuration efforts.
14A81A05A3
China Medical University Student ePaper2
Advanced Bioinformatics for Genomics and BioData Driven Research
The beauty of workflows and models
Reproducibility (and the R*) of Science: motivations, challenges and trends
Life Sciences De-Mystified - Mark Bünger - PICNIC '10
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
JulieKlein_Bosc2012
Collaboratively Creating the Knowledge Graph of Life
Reproducibility Using Semantics: An Overview
Introduction to Ontologies for Environmental Biology
How Bio.Kitchen
Biochips seminar report
Biological database by kk sahu
Metadata-based tools at the ENCODE Portal
Modern Biological Tools and Techniques
The seven-deadly-sins-of-bioinformatics3960
Ad

More from Oscar Corcho (20)

PPTX
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
PPTX
Introducción a los Datos Abiertos - Open Data Day 2020
PPTX
Open Data (and Software, and other Research Artefacts) - A proper management
PDF
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
PPTX
Ontology Engineering at Scale for Open City Data Sharing
PPTX
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
PPTX
STARS4ALL - Contaminación Lumínica
PPTX
Publishing Linked Statistical Data: Aragón, a case study
PPTX
An initial analysis of topic-based similarity among scientific documents base...
PPTX
Linked Statistical Data 101
PPTX
Aplicando los principios de Linked Data en AEMET
PPTX
Ojo Al Data 100 - Call for sharing session at IODC 2016
PPTX
Educando sobre datos abiertos: desde el colegio a la universidad
PPTX
STARS4ALL general presentation at ALAN2016
PPTX
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
PPTX
Presentación de la red de excelencia de Open Data y Smart Cities
PPTX
Why do they call it Linked Data when they want to say...?
PPTX
Linked Statistical Data: does it actually pay off?
PPTX
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
PPTX
Research Objects for improved sharing and reproducibility
Organisational Interoperability in Practice at Universidad Politécnica de Madrid
Introducción a los Datos Abiertos - Open Data Day 2020
Open Data (and Software, and other Research Artefacts) - A proper management
Adiós a los ficheros, hola a los grafos de conocimientos estadísticos
Ontology Engineering at Scale for Open City Data Sharing
Situación de las iniciativas de Open Data internacionales (y algunas recomen...
STARS4ALL - Contaminación Lumínica
Publishing Linked Statistical Data: Aragón, a case study
An initial analysis of topic-based similarity among scientific documents base...
Linked Statistical Data 101
Aplicando los principios de Linked Data en AEMET
Ojo Al Data 100 - Call for sharing session at IODC 2016
Educando sobre datos abiertos: desde el colegio a la universidad
STARS4ALL general presentation at ALAN2016
Generación de datos estadísticos enlazados del Instituto Aragonés de Estadística
Presentación de la red de excelencia de Open Data y Smart Cities
Why do they call it Linked Data when they want to say...?
Linked Statistical Data: does it actually pay off?
Slow-cooked data and APIs in the world of Big Data: the view from a city per...
Research Objects for improved sharing and reproducibility

Recently uploaded (20)

PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
neck nodes and dissection types and lymph nodes levels
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
The scientific heritage No 166 (166) (2025)
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
BIOMOLECULES PPT........................
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
Classification Systems_TAXONOMY_SCIENCE8.pptx
HPLC-PPT.docx high performance liquid chromatography
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
INTRODUCTION TO EVS | Concept of sustainability
2. Earth - The Living Planet Module 2ELS
microscope-Lecturecjchchchchcuvuvhc.pptx
2. Earth - The Living Planet earth and life
neck nodes and dissection types and lymph nodes levels
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Cardiovascular system_structure and functions-1
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Placing the Near-Earth Object Impact Probability in Context
The scientific heritage No 166 (166) (2025)
Derivatives of integument scales, beaks, horns,.pptx
BIOMOLECULES PPT........................
bbec55_b34400a7914c42429908233dbd381773.pdf

Towards Reproducible Science: a few building blocks from my personal experience

  • 1. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) http://www.oeg-upm.net/index.php/en/researchareas/3- semanticscience/index.html Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  • 2. Towards Reproducible Science Introduction 2 HYPOTHESIS CONVINCE AUDIENCE REPEATABLE SCIENTIFIC EXPERIMENTS
  • 3. Towards Reproducible Science Introduction 3 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO Alison’s biodiversity scientists
  • 4. Towards Reproducible Science Introduction 4 SCIENTIFIC EXPERIMENTS IN VIVO/VITRO IN SILICO REPEATABILITY Alison’s biodiversity scientists
  • 5. Towards Reproducible Science 5  Before continuing…. What does reproducibility mean for you? And for your colleagues? And for the colleagues from other disciplines?
  • 6. Towards Reproducible Science The R* brouhaha 6 Source: The R* brouhaha. Goble C. RDA-Europe’s workshop on RepScience 2016.
  • 7. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION 7
  • 8. Towards Reproducible Science My own take on terminology PRESERVATION CONSERVATION REPLICABILITY REPRODUCIBILITY 8
  • 9. Towards Reproducible Science Experiment components 9 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 10. Towards Reproducible Science Experiment components 10 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO This has attracted most of the attention so far
  • 11. Towards Reproducible Science Block 1. Experimental Protocols 11 Olga Giraldo Alexander Garcia Explore alternative ways for documenting and retrieving information from experimental protocols Using Semantics and NLP in the SMART Protocols Repository. Giraldo O, García-Castro A, Corcho O - ICBO, 2015 Using Semantics and Natural Language Processing in Experimental Protocols. Giraldo O, García-Castro A, Figueredo J, Corcho O - J Biomedical Semantics, to appear SMART protocols: semantic representation for experimental protocols. Giraldo O, García-Castro A, Corcho O – Linked Science 2014
  • 12. Towards Reproducible Science What is an experimental protocol  Experimental protocols are like cooking recipes  They have ingredients: reagents and sample  They have appliances: equipment,  They have a list of instructions, The protocols should have complete information that allows anybody to recreate an experiment.  They have a total time  They have critical steps…
  • 13. Towards Reproducible Science Some of the issues we aim at addressing • Incubate the centrifuge tubes in a water bath. • Incubate the samples for 5 min with gentle shaking. • Rinse DNA briefly in 1-2 ml of wash. • Incubate at -20C overnight.  some protocols present insufficient granularity,  the instructions can be imprecise or ambiguous due to the use of natural language.  The protocols lack structure
  • 14. Towards Reproducible Science Bio-ontologies OBI, EXPO, EXACT, BAO, IAO, ERO… Data repository for making data available few efforts focus on representing and standardizing experimental protocols. For reproducibility purposes, if the data must be available, so does the experimental protocol detailing the methodology followed to derive the data. Resources for reporting guidelines or Minimum Information standards Ingredients for Improving Reproducibility
  • 15. Towards Reproducible Science Main research question How to formalize the information from laboratory protocols as a knowledge base?
  • 16. Towards Reproducible Science Our approach • Ontology model representing lab protocols • Gazetteer-based method: use existing lists of named entities  Lists of proper nouns, which refer to real-life entities • Rule-based approaches: write manual extraction rules • Development of a Gold Standard of protocols annotated manually
  • 17. Towards Reproducible Science SMART Protocols ontology 17 http://vocab.linkeddata.es/SMARTProtocols/ https://smartprotocols.github.io/
  • 18. Towards Reproducible Science The SIRO model Sample/Specimen (whole organism, anatomical part, bodily fluids, etc.) Instruments (equipment, devices, consumables, software) Reagents (chemical compounds, mixtures) Objective (purpose) The SIRO model supports search, retrieval and classification of experimental protocols
  • 19. Towards Reproducible Science Design of semantic Gazetteer and JAPE rules Design of semantic Gazetteers • Facilitate the annotation of instances related to:  Experimental actions  Instruments  Samples/ organisms  Reagents Design of grammar rules • Facilitate the annotation of instructions
  • 20. Towards Reproducible Science Development of a Gold Standard 100 protocols published in several repositories Annotators - experts in life sciences http://smart- protocols.labs.linkingdata.io/dist/d ev/#/login The SMART Protocols Annotation Tool Guidelines about What and How annotate Materials: • BioTechniques, • CSH-Protocols, • Current protocols, • Genet and Mol. Res, • Journal of Biolog. Methods, • Jove, • MethodsX, • Nature protocols exchange, • Nature protocols • Curso BIOS 2016, Colombia • Universidad del Valle, Colombia • Japan (Database Center for Life Science (DBCLS), Robotic Biology Institute (RBI), Spiber, Yachie-Lab, University of Tokyo). • Universidad Santiago de Cali, Colombia
  • 21. Towards Reproducible Science Preliminary results Entities sample instrument reagent objective Sample Neural cell 3 0 0 0 neural stem cells (NSCs) 3 0 0 0 Instrument Cell culture centrifuge 0 3 0 0 cell culture incubator 0 3 0 0 Microscope 0 3 0 0 Millicell culture plate inserts 8-?m pore size 0 3 0 0 reagent B27 supplement 0 0 3 0 DMEM/F12 0 0 3 0 FGF2 neutralizing antibody 0 0 3 0 glucose 0 0 3 0 objective Here we describe two migration assays, a matrigel migration assay and a Boyden chamber migration assay, which allow the in vitro assessment of neural migration under defined conditions (Ladewig, Koch and Brüstle, 2014). 0 0 0 3 entities sample instrument reagent Reagent - Sample/Organism Ac-omega viral DNA 1 2 baculoviral 1 2 DNA insert 2 1 I-Sce I meganuclease 1 2 Sample/Organism Insect cells 3 Instrument spinner 3 Centrifuge 3 Flask 3 Reagent IPL-41 powdered 3 Liposome formulation 3 Phenol:chloroform 3 Fleiss Kappa for 3 raters = 1.0 Fleiss Kappa for 3 raters = 0.755
  • 22. Towards Reproducible Science Our ongoing work 22  So far, this is ok for handling protocols that have been already reported in papers Can we actually change the way in which these protocols are produced?
  • 23. Towards Reproducible Science Platform for publishing semantic protocols Features:  Open semantic publishing platform o The protocols are born semantic  Self describing documents o Meaningful entities o Machine procesable workflows  Documents will reference existing URIs o Samples/organisms o Reagents/chemical compounds o Instruments SMART Protocols Ontology / Gazetteers / Grammar rules UniProt NCBI taxonomy PubChem Vendors
  • 24. Towards Reproducible Science Platform available at: http://smartprotocols.labs.linkingdata.io/app/protocols The platform
  • 25. Towards Reproducible Science 25 Capturing relevant elements in the document
  • 26. Towards Reproducible Science Organisms come from the UniProt Taxon API 26 After selecting an organism, the correspondent ID is automatically recorded
  • 27. Towards Reproducible Science Reagents come from the PubChem API
  • 28. Towards Reproducible Science Machine processable workflows Step Step Step Step Step
  • 29. Towards Reproducible Science Final edited protocol, also available as bioschemas
  • 30. Towards Reproducible Science Block 2. Computational Environments 30 Idafen Santana Is it possible to describe the main properties of the Execution Environment of a Computational Scientific Experiment and, based on this description, derive a reproduction process for generating an equivalent environment using virtualization techniques? Conservation of Computational Scientific Execution Environments for Workflow- based Experiments Using Ontologies. Santana-Pérez I. PhD thesis, 2016. http://oa.upm.es/39520/
  • 31. Towards Reproducible Science Experiment components 31 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 32. Towards Reproducible Science Experiment components 32 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 33. Towards Reproducible Science Experiment components 33 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 34. Towards Reproducible Science Experiment components 34 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 35. Towards Reproducible Science Experiment components 35 DATA SCIENTIFIC PROCEDURE EQUIPMENT INSILICO
  • 36. Towards Reproducible Science bundles and relates digital resources of a scientific experiment or investigation using standard mechanisms, “tool middleware” http://www.w3.org/community/rosc/ http://www.researchobject.org/
  • 37. Towards Reproducible Science Experiment components 38 DATA SCIENTIFIC PROCEDURE EQUIPMENT INVIVO/VITROINSILICO
  • 38. Towards Reproducible Science Open Research Problems 39
  • 39. Towards Reproducible Science Open Research Problems 40  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.
  • 40. Towards Reproducible Science Open Research Problems 41  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.
  • 41. Towards Reproducible Science Open Research Problems 42  Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow.  Execution Environments are poorly described.  Current reproducibility approaches for computational experiments consider mostly data and procedure.
  • 42. Towards Reproducible Science Representation 43 CLOUD  Describing execution environments FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT
  • 43. Towards Reproducible Science Representation  WICUS ontology network o Workflow Infrastructure Conservation Using Semantics o http://purl.org/net/wicus o 5 ontologies • WICUS Workflow Execution Requirements ontology • WICUS Software Stack ontology • WICUS Hardware Specs ontology • WICUS Scientific Virtual Appliance ontology • WICUS Ontology: links the previous ontologies 44
  • 44. Towards Reproducible Science WICUS ontology network  WICUS Workflow Execution Requirements ontology o http://purl.org/net/wicus-reqs 45
  • 45. Towards Reproducible Science WICUS ontology network  WICUS Software Stack ontology o http://purl.org/net/wicus-stack 46
  • 46. Towards Reproducible Science WICUS ontology network  WICUS Scientific Virtual Appliance ontology o http://purl.org/net/wicus-sva 47
  • 47. Towards Reproducible Science WICUS ontology network  WICUS Hardware Specs ontology o http://purl.org/net/wicus-hwspecs 48
  • 48. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 49
  • 49. Towards Reproducible Science WICUS ontology network  WICUS ontology network o http://purl.org/net/wicus 50
  • 50. Towards Reproducible Science WICUS system  Overview, inputs and outputs 51
  • 51. Towards Reproducible Science Evaluation  Workflows reproduced o 3 scientific domains o 3 workflow management systems o 6 different workflows 52 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST (2003) (2014)(2014) (2015) (2011)(2011)
  • 52. Towards Reproducible Science Evaluation 53 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results FORMER EQUIPMENT ANNOTATE REPRODUCE CLOU D EQUIVALENT EXECUTION ENVIRONMENTSEMANTIC ANNOTATIONS COMPARE
  • 53. Towards Reproducible Science Evaluation 54 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  • 54. Towards Reproducible Science Evaluation 55 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Non-deterministic • Standard and error output • Generated files equivalent
  • 55. Towards Reproducible Science Evaluation 56 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Same results • Results from Int. Extinction may vary
  • 56. Towards Reproducible Science Evaluation 57 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE • Genomic data • Exact match
  • 57. Towards Reproducible Science Evaluation 58 Domain Seismic Astronomy Bio WMS dispel4py Pegasus Makeflow Name xcorr Internal Extinction Montage Epigenomics SoyKB BLAST Results CLOU D FORMER EQUIPMENT ANNOTATE REPRODUCE SEMANTIC ANNOTATIONS EQUIVALENT EXECUTION ENVIRONMENT COMPARE
  • 58. Towards Reproducible Science Summarizing  Two building blocks towards reproducibility of scientific experiments o In vivo/vitro • Focus on providing structured descriptions of methods (laboratory protocols) • Our tools: ontologies, gazeteers, NLP tools and automatic and manual annotation tools • Challenge: make protocols be more structured (and semantic) from the beginning o In silico • Focus on the equipment (computational infrastructure) for workflow-based experiments • Ontologies, automatic and manual annotation tools, and an execution environment • Challenge: keep track of all types of appliances, and make scientists work on providing annotations  Is this enough? 59
  • 59. Towards Reproducible Science Summarizing  Is this enough? Clearly not, but a step forward towards ensuring reproducibility (with a focus on methods) 60
  • 60. Oscar Corcho (with contributions from Olga Giraldo, Alexander García, and Idafen Santana) Ontology Engineering Group Universidad Politécnica de Madrid, Spain Towards Reproducible Science: a few building blocks from my personal experience ocorcho@fi.upm.es @ocorcho 22/10/2017 S4BioDiv2017, Vienna
  • 61. Towards Reproducible Science Light pollution (www.stars4all.eu)

Editor's Notes

  • #2: Cambiar la licencia por la que aplique.
  • #3: Experiments are central to empirical science, they are the foundation in which experimental sciences are built and improved. They allow to verify the hypothesis defined according to the scientific method. Convince the reader (other scientists) that the conclusions of an study are correct. For that, and for supporting the growth of science, the must be a repeatable process. (both by him/herself and by other scientists).
  • #4: In last decades there has been an evolution in the way experimental science is conducted, adding computational resources for solving scientific problems. We have moved from a paradigm in which experiments were mainly conducted on laboratories or in nature, also referred to as in vitro or in vivo science To a paradigm in which simulations and mathematical models executed over computational resources, are used for obtaining scientific insights, also referred to as computational science or in silico science. Computational experiments complement rather than substitute classical experiments.
  • #5: In both cases, either in classical or computational experimental science, experiments must be a repeatable process For trusting the scientific results And for allowing the development of incremental research.
  • #8: In this context, a definition of which kind od repeatability we are looking for, and how we plan to do it, must be provided. The first thing that we have to do, is to define how we are going to take care of the object of interest, which can be done in 2 main ways Preservation: the act of isolating the object preventing any interaction that could damage it. Conservation: the set of actions for studying the object and its associated features, allowing a supervised or restricted use of it. The processes allow to prolong the life of the object.
  • #9: Once a plan for taking care of the object have been stated, we have as well two ways for obtaining a repetition of the it: A replication: an copy of the original object which is as close as possible to the original A reproduction: an object that expose or mimic a certain set of features in the same way as the original one In this work we explore how conservation techniques can be applied for experimental science reproducibility For achieving this conservation and reproducibility…
  • #10: Any scientific experiment can be divided into three main components DATA: the phenomena we study from nature, light from stars, genomes from plants or animals, reports in social science, etc. SCIENTIFIC PROCEDURE: the set of steps that have to be performed in order to obtain the results of the experiment. EQUIPMENT: the set of tools that are required by scientists in order to capture, process and interpret the desired data. From telescopes to microscopes, petri dishes or bunsen burners, there is a wide range of tools depending on the scientific domain. All these components… __________________________
  • #11: All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  • #25: In our platform the users login with an ORCID ID.
  • #26: We capture bibliographic data and information related to the description of the protocol like purpose, applications, advantages, limitations, etc.
  • #27: We capture a set of metadata for representing the sample, one of them is the name of the organism; and the name of the organism come from …
  • #28: And in the case of the reagents we capture the reagents from PubChem API
  • #29: the users can draw their workflows, describe each step or instruction and capture additional information as equipment, reagent, kits, software that participate in each step, also the users can include alerts messages, etc.
  • #32: All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  • #33: Some initiatives have been proposed to target the reproducibility issues of the different components of experiments in computational science.
  • #34: DATA Examples: RDA, Open Provenance Mode, MIBBI, VCR…
  • #35: Some initiatives have been proposed to target the reproducibility issues of the parts of computational experiments. SC. PROCEDURE Examples: Taverna, Pegasus, WINGS, Galaxy, SCUFL WMS and their related WF languages are a way of encapsulating an preserving the scientific procedure in computational experiments Platforms such as myExperiment allow its sharing and reproducibility
  • #36: Finally, we found that there was a lack of approaches targeting the computational equipment by the time we started this work. Most of the work done in the area by that time, focused on sharing virtual machine images, as a way of providing exact copies of the execution environment During the time of this work, some other initiatives have appear targeting this problem, as we will discuss later. ------------------------------------------------- EQUIPMENT There is a lack of initiatives in this aspect Some projects have aimed to approach it during the time of this work. Most of them focus on the use of VM -> BLACK BOXES (here we should motivate the need of exposing the knowledge about the execution environment for increasing the reproducibility) Examples: CernVM, ReproZip, TIMBUS NOTE: LINK THIS ONE WITH THE FOLLOWING SLIDE ABOUT THE OPEN RESEARCH PROBLEMS
  • #37: To share your research materials (RO as a social object) To facilitate reproducibility and reuse of methods To be recognized and cited (even for constituent resources) To preserve results and prevent decay (curation of workflow definition; using provenance for partial rerun) Middleware
  • #39: All these components have a counterpart in the Computational Science world. DATA is often represented by means of tables in data bases, structured files, or even web services providing data. The SCIENTIFIC PROCEDURE can be defined by the source code written on a given language or by the descriptions of a set of invocations of different tools. … and in last decades, as Scientific Workflows, which have emerge as a paradigm for formally defining the set of data transformations to perform the scientific procedure of a computational experiment. Finally, the EQUIPMENT of a computational experiment is defined by the of hardware and software resources that are required to execute the experiment. Some initiatives have ….
  • #40: The firs open problem we identified is that… ____________________________________ Open Research Problem 1: Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow. The majority of computational scientists develop their experiments with an already existing infrastructure in mind, thus not considering its definition as part of the experiment. Open Research Problem 2: Execution Environments are poorly described, or even not described at all, when describing the results of an experiment. Often, the infrastructure used in the evaluation process is summarized explaining briefly its hardware overall capabilities and the basic software stack. This lack of information compromises the conservation and reproducibility of the experiment. Open Research Problem 3: Current approaches for Computational Scientific Experiments conservation and reproducibility take into account only the compu-tational process of the experiment (scientific procedure) and the data used and produced, but not the execution environment.
  • #41: Open Research Problem 1: Computational Infrastructures are usually a predefined element of a Computational Scientific Workflow. The majority of computational scientists develop their experiments with an already existing infrastructure in mind, thus not considering its definition as part of the experiment.
  • #42: Open Research Problem 2: Execution Environments are poorly described, or even not described at all, when describing the results of an experiment. Often, the infrastructure used in the evaluation process is summarized explaining briefly its hardware overall capabilities and the basic software stack. This lack of information compromises the conservation and reproducibility of the experiment.
  • #43: Open Research Problem 3: Current approaches for Computational Scientific Experiments conservation and reproducibility take into account only the computational process of the experiment (scientific procedure) and the data used and produced, but not the execution environment. Based on this study, in this work, we focus on the aspects related to the reproducibility of the computational EQUIPMENT of a scientific experiment defined as a computational scientific workflows.
  • #44: That is, a set of modes for annotating the original environment, and that can be used for specifying and reproducing a new equivalent using cloud solutions
  • #45: As a result of this process, we developed the WICUS ontology network, which is composed…
  • #46: The first ontology is the workflow execution environment, which introduces the concept of workflow… Using this ontology we can describe the structure of a workflow, such as the ones depicted on this figure, which describes 3 workflows belonging to the Pegasus WMS, represented by the different figures and colors. Here we see how each of workflows is composed by a set of subworkflows, each one of them related to different execution requirement, as well as the requirement defined by the WMS (pegasus in this case).
  • #47: - DEPENDENCIES: JAR FILES DEPENDS ON THE JAVA VM
  • #51: Examples… Based on these models, that allow us to describe execution environments of scientific workflows….
  • #52: These system is composed by 3 main stages, which process the available experimental materials, for obtaining the corresponding enactment files These enactment files can be executed for deploying a reproduced execution environment. These overview can be decomposed into a set of modules and intermediate results generated during the process of reproducing an experiment _________________________________________ There are several input files and registries that can be used to extract information about the execution environment of the workflows Wf spec (DAG, make, etc.) SW comp registry (TC) WMS annotations (manual) SVA catalog (manual)
  • #53: We evaluated a total of 6 different workflows All of them expose different computational characteristics, From small ones, such as internal extinction, to really large ones such as SoyKb Or those requiring small amount of time for execution, such as xcorr, or montage, to the ones requiring 20 to 24 hours, such as BLAST All these workflows have been developed by different institutions, and published in different conferences and journals Some of them date from a decade ago, whereas others have been published recently We have selected them, based on their domain and the availability of their materials and support by the communities.
  • #54: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #55: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #56: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #57: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #58: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #59: Executed the 6 workflows in their original context Documented their execution environment Executed the ISA, obtaining enactment scripts Enacted the reproduced environments and executed the workflows. Workflow results compared to the corresponding baseline executions Montage: which generates an image of the sky, pHash similarity, factor 1.0, 0.85 factor Epigenomics and SoyKB: non-deterministic, out files equal in terms of number of lines and content, with no errors. Internal Extinction and xcorr: exact same results, even when in the case of internal extinction they may vary BLAST: equal results With this we consider the reproduction of the execution environments to be successful
  • #62: Cambiar la licencia por la que aplique.