SlideShare a Scribd company logo
What was the plan?
A role for data standards, models and computational
workflows in scholarly data publishing	

Alejandra González-Beltrán, PhD 	

	

 	

 Philippe Rocca-Serra, PhD 	

Oxford e-Research Centre, University of Oxford
{alejandra.gonzalezbeltran,philippe.rocca-serra}@oerc.ox.ac.uk
ISMB Workshop:What Bioinformaticians need to know about 	

digital publishing beyond the PDF2	

July15th, 2014 Boston, USA
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
The experimental workflow
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
The experimental workflow
metadata
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
The experimental workflow
metadata
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
Data Interoperability
The experimental workflow
Reproducibility
Data Review
The experimental workflow
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
Data Reusability
The experimental plan - life sciences case
experimental design!
sample characteristic(s)!
experimental variable(s)!
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug candidates from participating companies and
2 reference toxic compounds
InnoMed PredTox Project
The experimental plan - life sciences case
experimental design!
sample characteristic(s)!
experimental variable(s)!
technology(s)!
measurement(s)!
protocols(s)!
data file(s)!
…!
The experimental plan - computational case
•open peer-review
•availability of
•data
•analysis scripts
•documentation
Evaluation of SOAPdenovo2 tool for the de novo assembly of genomes from small DNA
segments reads by next generation sequencing, implementing improvements over
SOAPdenovo1 assembler.
genome
assembly
algorithm
genome
size
Predictor Variables!
(Factor Name, Factor Type)
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
Predictor Variables!
(Factor Name, Factor Type)
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
Predictor Variables!
(Factor Name, Factor Type)
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables!
(Factor Name, Factor Type)
3x3 factorial design
9 study groups
The experimental plan - computational case
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables!
(Factor Name, Factor Type)
The experimental plan - computational case
S. aureus
R. sphaeroides
B. impatiens
Chinese Han genome
(or YH genome)
genome
assembly
algorithm
genome
size
SOAPdenovo2
SOAPdenovo1
ALL-PATHS-LG
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
bacterial genome
insect genome
human genome
Predictor Variables!
(Factor Name, Factor Type)
The experimental plan - computational case
Response Variables!
genome coverage
computation run time
memory consumption
http://www.ama-rochester.org/WP/wp-content/uploads/2013/01/three-pillars.png
17
A growing ecosystem of over 30 public and internal resources using
the ISA metadata tracking framework (ISA-Tab and/or tools) to
facilitate standards-compliant collection, curation, management and
reuse of investigations in an increasingly diverse set of life science
domains, including:	

!
• stem cell discovery	

• system biology	

• transcriptomics	

• toxicogenomics	

• also by communities working to build a library of cellular
signatures
!
• environmental health	

• environmental genomics	

• metabolomics	

• metagenomics	

• nanotechnology	

• proteomics
General-purpose,
configurable format designed
to support:
!
• description of the experimental
metadata, making the annotation
explicit and discoverable
!
• provenance tracking
!
• use of community standards,
such as minimal reporting guidelines
and terminologies
!
• designed to be converted to - a
growing number of - other metadata
formats, e.g. used by the European
Bioinformatics Institute (EBI)
repositories
!
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
H. Sapiens
H. Sapiens
H. Sapiens
H1
H1
H2
35
35
33
Years
Years
Years
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
Scanning
Scanning
Scanning
...
H. Sapiens
33 Years
H1
H2
H1.sample1
H1.sample2
H2.sample1
Labeling
Labeling
H1.sample1.labeled
H2.sample1.labeled
h1-s1.cel
h1-s2.cel
h2-s1.cel
H. Sapiens
35 Years
Scanning
Scanning
Scanning
...
...
...
obi:material 	

entity
obi:material	

sample
obi:material 	

processing
obi:processed 	

material
obi:planned	

process
isa:raw data
file
bfo:derives from
ISMB Workshop 2014
ISMB Workshop 2014
http://gigasciencejournal.com
http://gigadb.org/dataset/100035
http://gigasciencejournal.com
http://gigadb.org/dataset/100035
Experimental metadata
or
structured component
(in-house curated,
machine-readable
formats)
Article or
narrative
component
(PDF and HTML)
A new online-only publication for descriptions of scientifically valuable datasets
in the life, environmental and biomedical sciences, but not limited to these!
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting Community
Data Repositories
Open Access
SOAPdenovo2
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
http://isa-tools.github.io/soapdenovo2
Galaxy workflows to re-enact the data analysis
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
Nanopub: represents structured data along with its
provenance in a single publishable and citable entity
http://isa-tools.github.io/soapdenovo2
SOAPdenovo2
ResearchObject: enables the aggregation of the digital
resources contributing to findings of computational
research, including results, data and software, as citable
compound digital objects
Reproducing SOAPdenovo2 results	

Galaxy workflows
S. aureus pipeline
Reproducing SOAPdenovo2 results	

Galaxy workflows
Reproducing SOAPdenovo2 results	

Galaxy workflows
2241 400
30
119.0 11 106 24 68
0
Reproducing SOAPdenovo2 results	

Galaxy workflows
“genome coverage increased
over the human data when
comparing SOAPdenovo2
against SOAPdenovo1”!
Response Variables!
genome coverage
memory consumption
OntoMaton:(a(Bioportal(powered(
Ontology(widget(for(Google(
Spreadsheets(
Maguire(et(al,((2013(
Bioinforma?cs(
widget for
ontology
annotation and
tagging on
Google
spreadsheets
relying on
BioPortal and
Linked Open
Vocabularies
services
OntoMaton:(a(Bioportal(powered(
Ontology(widget(for(Google(
Spreadsheets(
Maguire(et(al,((2013(
Bioinforma?cs(
widget for
ontology
annotation and
tagging on
Google
spreadsheets
relying on
BioPortal and
Linked Open
Vocabularies
services
NanoMaton https://github.com/ISA-tools/NanoMaton
Ontology for Biomedical Investigations
SemanticsScience Integrated Ontology
Data
Scientist
Visualization
Analysis
Planning
Data
Management
Data CollectionPublication
Use existing
data
Perform new
experiment
Findable, Accessible, Interoperable, Reusable!FAIR data
Contributing to !
Metabolights and ISA
• BBRSC UK-China Award & BGI funded Hackathon!
• venue: BGI Hong-Kong!
• Participants:!
• Metabolights/BGI/ISA/Birmingham/Hong-Kong
University!
• Outcome: !
• ISAtab web viewer code!
• Functional Specifications & Code for DoE
Wizard API!
• 4 datasets coded in ISA format!
• Conversion Metabolights datasets to RDF
ISMB Workshop 2014
funders
acknowledgements
Scott Edmunds, GigaScience
Peter Li, GigaScience
Jun Zhao, Lancaster University
María Susana Avila García, Oxford University
Marco Roos, Leiden University
Mark Thompson, Leiden University
Ruibang Luo, University of Hong Kong
Tin-Lap Lee, Chinese University of 	

Hong Kong
Tak-wah Lam, University of Hong Kong
Questions?
You can email us...	

isatools@googlegroups.com
View our blog	

http://isatools.wordpress.com
Follow us onTwitter	

@isatools
View our websites	

View our Git repo & contribute	

http://github.com/ISA-tools
Thanks for your attention!

More Related Content

PDF
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
PDF
OpenTox Europe 2013
PDF
Beyond the PDF 2, 2013
PDF
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
From peer-reviewed to peer-reproduced: a role for research objects in scholar...
OpenTox Europe 2013
Beyond the PDF 2, 2013
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...

What's hot (20)

PDF
Drug Discovery- ELRIG -2012
PDF
Ontomaton icbo2013-alternative order-t_wv3
PDF
BioSharing.org - mapping the landscape of community standards, databases, dat...
PPTX
Aspects of Reproducibility in Earth Science
PPTX
Being Reproducible: SSBSS Summer School 2017
PPTX
ROHub
PPTX
Reproducibility Using Semantics: An Overview
PPTX
The Research Object Initiative: Frameworks and Use Cases
PPTX
The Rhetoric of Research Objects
PPT
DCC Keynote 2007
PPT
The beauty of workflows and models
PPTX
2016 davis-plantbio
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PPTX
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
PDF
2014 11-13-sbsm032-reproducible research
PPTX
Advances in Scientific Workflow Environments
PPTX
FAIRer Research
PPTX
PhD Thesis: Mining abstractions in scientific workflows
PDF
Current advances to bridge the usability-expressivity gap in biomedical seman...
Drug Discovery- ELRIG -2012
Ontomaton icbo2013-alternative order-t_wv3
BioSharing.org - mapping the landscape of community standards, databases, dat...
Aspects of Reproducibility in Earth Science
Being Reproducible: SSBSS Summer School 2017
ROHub
Reproducibility Using Semantics: An Overview
The Research Object Initiative: Frameworks and Use Cases
The Rhetoric of Research Objects
DCC Keynote 2007
The beauty of workflows and models
2016 davis-plantbio
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
FAIRDOM - FAIR Asset management and sharing experiences in Systems and Synthe...
2014 11-13-sbsm032-reproducible research
Advances in Scientific Workflow Environments
FAIRer Research
PhD Thesis: Mining abstractions in scientific workflows
Current advances to bridge the usability-expressivity gap in biomedical seman...
Ad

Similar to ISMB Workshop 2014 (20)

PDF
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
PDF
Sharing massive data analysis: from provenance to linked experiment reports
ODP
2011 03-provenance-workshop-edingurgh
PDF
COPO kick-off meeting
PDF
Cassava genome hub
PPT
Results may vary: Collaborations Workshop, Oxford 2014
PDF
Reproducible, Open Data Science in the Life Sciences
PDF
Reproducibility 1
PPT
Knowledge Infrastructure for Global Systems Science
PDF
2015_CV_J_SHELTON_linked
PPTX
The FAIRDOM Commons for Systems Biology
PDF
ICAR 2015 Workshop - Nick Provart
PDF
Aussois bda-mdd-2018
PPTX
Software Sustainability: Better Software Better Science
PDF
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
PPTX
Mtsr2015 goble-keynote
PDF
Towards Reproducibility of Microscopy Experiments
PPTX
Reproducibility (and the R*) of Science: motivations, challenges and trends
PPTX
FAIR Workflows and Research Objects get a Workout
PPT
Dynamic Social Network Analysis (and more!) with eResearch Tools
Seminario en CIFASIS, Rosario, Argentina - Seminar in CIFASIS, Rosario, Argen...
Sharing massive data analysis: from provenance to linked experiment reports
2011 03-provenance-workshop-edingurgh
COPO kick-off meeting
Cassava genome hub
Results may vary: Collaborations Workshop, Oxford 2014
Reproducible, Open Data Science in the Life Sciences
Reproducibility 1
Knowledge Infrastructure for Global Systems Science
2015_CV_J_SHELTON_linked
The FAIRDOM Commons for Systems Biology
ICAR 2015 Workshop - Nick Provart
Aussois bda-mdd-2018
Software Sustainability: Better Software Better Science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Mtsr2015 goble-keynote
Towards Reproducibility of Microscopy Experiments
Reproducibility (and the R*) of Science: motivations, challenges and trends
FAIR Workflows and Research Objects get a Workout
Dynamic Social Network Analysis (and more!) with eResearch Tools
Ad

More from Alejandra Gonzalez-Beltran (11)

PDF
The Software Sustainability Institute Fellowship
PDF
CMSO Minimal reporting requirements
PDF
The DATS model: datasets descriptions for data discovery in DataMed
PDF
Datasets with bioschemas
PDF
Data publication: Discover, Explore, Visualise
PDF
ISA commons - overview and latest developments
PDF
Metadata for Interoperable Bioscience
PDF
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
PDF
Brazil-UK Frontiers of Engineering - Big data in healthcare session
PDF
The Software Sustainability Institute Fellowship
CMSO Minimal reporting requirements
The DATS model: datasets descriptions for data discovery in DataMed
Datasets with bioschemas
Data publication: Discover, Explore, Visualise
ISA commons - overview and latest developments
Metadata for Interoperable Bioscience
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Brazil-UK Frontiers of Engineering - Big data in healthcare session

Recently uploaded (20)

PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Introduction to Business Data Analytics.
PPTX
Global journeys: estimating international migration
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
1_Introduction to advance data techniques.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Business Data Analytics.
Global journeys: estimating international migration
Data_Analytics_and_PowerBI_Presentation.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
climate analysis of Dhaka ,Banglades.pptx
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
1_Introduction to advance data techniques.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Introduction-to-Cloud-ComputingFinal.pptx

ISMB Workshop 2014