SlideShare a Scribd company logo
Date: 04/05/2015
Creation of abstractions
in scientific workflows
Daniel Garijo Verdejo,
Oscar Corcho,
Yolanda Gil
Ontology Engineering Group. Laboratorio de Inteligencia Artificial
Departamento de Inteligencia Artificial
Facultad de Informática
Universidad Politécnica de Madrid
2
Overview: In Silico Scientific workflows
Benefits:
•Sharing and reusing previous work
•Time savings: reexecution of old experiments with different parameters).
•Teaching: new students can learn existing methods in the lab
•Design for modularity, so others can reuse
•Design for standardization, reduction of heterogeneity
•Debugging of executions
•Paper writing, linking execution pipelines to publications.
•Reproducibility.
•Etc.
Lab book
Digital Log
Laboratory Protocol
(recipe)
Workflow
Experiment
Hypotheses
Scientific workflow repositories can be mined automatically
to extract reusable patterns and abstractions that are
useful for workflow developers aiming to reuse existing
workflows.
•H1: It is possible to define common domain independent
patterns based on the functionality of workflow steps.
•H2: It is possible to detect common reusable patterns
automatically.
•H3: Common reusable patterns are potentially useful for users
3
Challenges
•Workflow representation
•Heterogeneous representations.
•Lack of a standard
•Lack of methodologies for publishing workflows.
•Workflow abstraction
•There are no catalogs of the typical abstractions that can be found in
scientific workflows based on their basic step functionality.
•Difficulty in relating workflows.
•Workflow reuse
•Difficult to determine which parts of a workflow could be reused for /in
another workflow
•Workflow annotation and documentation
•Manual process
4
Approach
5
Vocabularies and methodologies for representing and publishing workflows
6
Interactive
Browsing
(Pubby frontend)
Programatic access
(external apps)
Wings workflow
generation
OPM/PROV
conversion
Publication Share Reuse
Core
Portal
WINGS on local laptop
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on shared host
Workflow
Template
Workflow
Instance
PROV
export
Core
Portal
WINGS on web server
Workflow
Template
Workflow
Instance
PROV
export
Linked
Data
Publication
Users
Other
workflow
environments
RDF
TripleStore
Workflow Provenance
Workflow Plan
Methodology for workflow publishing
Repository of linked workflows:
http://www.opmw.org/sparql
http://purl.org/net/p-plan
http://www.opmw.org/ontology/
Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56.
Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston,
2012.
Definition of workflow abstractions
7
Catalog of common independent
workflow abstractions (motifs)
Data-oriented motifs: What kind of
manipulations does the workflow
have?
Workflow-oriented motifs: How does
the workflow perform its operations
Analysis from 260 different workflows
from 10 domains analyzed belonging
to 5 different workflow systems
http://purl.org/net/wf-motifs#
Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific
workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
Finding and evaluating common abstractions
8
https://github.com/dgarijo/FragFlow
http://purl.org/net/wf-fd
Graph mining techniques
Workflow fragment
representation
and linkage
Workflow fragment
Filtering techniques
Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th
IEEE International Conference on e-Science, Guaruja, 2014
Evaluation and results
9
Scientific workflow repositories can be mined automatically to extract reusable patterns
and abstractions that are useful for workflow developers aiming to reuse existing
workflows.
•Evaluation 1: Comparison against what users defined in the corpus
•Are our patterns similar to what you identified as a useful pattern?
•When playing with the pattern frequency, up to 75% of the detected
patterns are the same as the ones defined by users.
•Evaluation 2: User survey
•From those patterns we found disjoint with the user defined ones, are they
useful?
•66%-100% of the proposed patterns were considered useful
•Survey on three corpora.
Summary
10
•Workflow representation
•Models based on standards for representing workflow provenance and
workflow templates
•Adapted a common used methodology for publishing workflows as web
objects.
•Workflow abstraction
•Defined a catalog of common domain independent abstractions, based on
their functionality.
•Provided an ontology for semi-automatic annotation.
•Workflow reuse
•Automatic detection and annotation of common useful patterns given a
workflow corpora.
•Models to relate how patterns link and relate different workflows on a
workflow corpus.
11
Collaborators and co-authors
•Daniel Garijo, Oscar Corcho
Ontology Engineering Group, UPM
•Yolanda Gil
Information Sciences Institute, USC
•Boris A. Gutman, Ivo D. Dinov, Paul ThompsonArthur W. Toga,
Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad.
USC Laboratory of Neuro Imaging
IEEE eScience 2014. Guarujá, Brasil
•Pinar Alper, Khalid Belhajjame, Carole Goble

More Related Content

PPTX
Online direct import of specimen records from iDigBio infrastructure into tax...
PDF
[Final] project presentation
PDF
Data Workflows for Machine Learning - Seattle DAML
PPTX
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PPTX
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PPTX
PhD Thesis: Mining abstractions in scientific workflows
PDF
Towards Workflow Ecosystems Through Semantic and Standard Representations
PDF
Towards an Infrastructure for Enabling Systematic Development and Research of...
Online direct import of specimen records from iDigBio infrastructure into tax...
[Final] project presentation
Data Workflows for Machine Learning - Seattle DAML
From Scientific Workflows to Research Objects: Publication and Abstraction of...
From Scientific Workflows to Research Objects: Publication and Abstraction of...
PhD Thesis: Mining abstractions in scientific workflows
Towards Workflow Ecosystems Through Semantic and Standard Representations
Towards an Infrastructure for Enabling Systematic Development and Research of...

Similar to Creating abstractions from scientific workflows: PhD symposium 2015 (20)

PPTX
Status update OEG - Nov 2012
PPTX
WORKS 11 Presentation
PPTX
FAIR Computational Workflows
PPT
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
PPTX
Advances in Scientific Workflow Environments
PPTX
ISI work
PDF
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
PPTX
Common Motifs in Scientific Workflows: An Empirical Analysis
PPTX
Credible workshop
PDF
Converting scripts into reproducible workflow research objects
PDF
Converting Scripts into Reproducible Workflow Research Objects
PDF
Data legend dh_benelux_2017.key
PDF
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
PDF
Linked Data: Een extra ontstluitingslaag op archieven
PDF
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
PDF
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
PPTX
Detecting common scientific workflow fragments using templates and execution ...
PDF
Overview of Scientific Workflows - Why Use Them?
PDF
A pattern-based ontology for describing publishing workflows
PPT
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Status update OEG - Nov 2012
WORKS 11 Presentation
FAIR Computational Workflows
Small Is Beautiful: Summarizing Scientific Workflows Using Semantic Annotat...
Advances in Scientific Workflow Environments
ISI work
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Common Motifs in Scientific Workflows: An Empirical Analysis
Credible workshop
Converting scripts into reproducible workflow research objects
Converting Scripts into Reproducible Workflow Research Objects
Data legend dh_benelux_2017.key
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
Linked Data: Een extra ontstluitingslaag op archieven
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERY
Workflow Reuse in Practice: A Study of Neuroimaging Pipeline Users
Detecting common scientific workflow fragments using templates and execution ...
Overview of Scientific Workflows - Why Use Them?
A pattern-based ontology for describing publishing workflows
Workflows, provenance and reporting: a lifecycle perspective at BIH 2013, Rome
Ad

More from dgarijo (20)

PDF
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
PDF
FAIR Workflows: A step closer to the Scientific Paper of the Future
PPTX
Towards Reusable Research Software
PDF
SOMEF: a metadata extraction framework from software documentation
PPTX
A Template-Based Approach for Annotating Long-Tailed Datasets
PPTX
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
PPTX
Towards Knowledge Graphs of Reusable Research Software Metadata
PPTX
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
PPTX
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
PPTX
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
PPTX
Towards Human-Guided Machine Learning - IUI 2019
PPTX
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
PPTX
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
PPTX
WIDOCO: A Wizard for Documenting Ontologies
PPTX
Towards Automating Data Narratives
PDF
Automated Hypothesis Testing with Large Scale Scientific Workflows
PDF
OntoSoft: A Distributed Semantic Registry for Scientific Software
PDF
OEG tools for supporting Ontology Engineering
PDF
Software Metadata: Describing "dark software" in GeoSciences
PPTX
Reproducibility Using Semantics: An Overview
FOOPS!: An Ontology Pitfall Scanner for the FAIR principles
FAIR Workflows: A step closer to the Scientific Paper of the Future
Towards Reusable Research Software
SOMEF: a metadata extraction framework from software documentation
A Template-Based Approach for Annotating Long-Tailed Datasets
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
Towards Knowledge Graphs of Reusable Research Software Metadata
Scientific Software Registry Collaboration Workshop: From Software Metadata r...
WDPlus: Leveraging Wikidata to Link and Extend Tabular Data
OKG-Soft: An Open Knowledge Graph With Mathine Readable Scientific Software M...
Towards Human-Guided Machine Learning - IUI 2019
Capturing Context in Scientific Experiments: Towards Computer-Driven Science
A Controlled Crowdsourcing Approach for Practical Ontology Extensions and Met...
WIDOCO: A Wizard for Documenting Ontologies
Towards Automating Data Narratives
Automated Hypothesis Testing with Large Scale Scientific Workflows
OntoSoft: A Distributed Semantic Registry for Scientific Software
OEG tools for supporting Ontology Engineering
Software Metadata: Describing "dark software" in GeoSciences
Reproducibility Using Semantics: An Overview
Ad

Recently uploaded (20)

PDF
Basic Mud Logging Guide for educational purpose
PDF
Pre independence Education in Inndia.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Insiders guide to clinical Medicine.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Basic Mud Logging Guide for educational purpose
Pre independence Education in Inndia.pdf
human mycosis Human fungal infections are called human mycosis..pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Week 4 Term 3 Study Techniques revisited.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Anesthesia in Laparoscopic Surgery in India
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Microbial diseases, their pathogenesis and prophylaxis
O7-L3 Supply Chain Operations - ICLT Program
Insiders guide to clinical Medicine.pdf
PPH.pptx obstetrics and gynecology in nursing
Supply Chain Operations Speaking Notes -ICLT Program
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx

Creating abstractions from scientific workflows: PhD symposium 2015

  • 1. Date: 04/05/2015 Creation of abstractions in scientific workflows Daniel Garijo Verdejo, Oscar Corcho, Yolanda Gil Ontology Engineering Group. Laboratorio de Inteligencia Artificial Departamento de Inteligencia Artificial Facultad de Informática Universidad Politécnica de Madrid
  • 2. 2 Overview: In Silico Scientific workflows Benefits: •Sharing and reusing previous work •Time savings: reexecution of old experiments with different parameters). •Teaching: new students can learn existing methods in the lab •Design for modularity, so others can reuse •Design for standardization, reduction of heterogeneity •Debugging of executions •Paper writing, linking execution pipelines to publications. •Reproducibility. •Etc. Lab book Digital Log Laboratory Protocol (recipe) Workflow Experiment
  • 3. Hypotheses Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing workflows. •H1: It is possible to define common domain independent patterns based on the functionality of workflow steps. •H2: It is possible to detect common reusable patterns automatically. •H3: Common reusable patterns are potentially useful for users 3
  • 4. Challenges •Workflow representation •Heterogeneous representations. •Lack of a standard •Lack of methodologies for publishing workflows. •Workflow abstraction •There are no catalogs of the typical abstractions that can be found in scientific workflows based on their basic step functionality. •Difficulty in relating workflows. •Workflow reuse •Difficult to determine which parts of a workflow could be reused for /in another workflow •Workflow annotation and documentation •Manual process 4
  • 6. Vocabularies and methodologies for representing and publishing workflows 6 Interactive Browsing (Pubby frontend) Programatic access (external apps) Wings workflow generation OPM/PROV conversion Publication Share Reuse Core Portal WINGS on local laptop Workflow Template Workflow Instance PROV export Core Portal WINGS on shared host Workflow Template Workflow Instance PROV export Core Portal WINGS on web server Workflow Template Workflow Instance PROV export Linked Data Publication Users Other workflow environments RDF TripleStore Workflow Provenance Workflow Plan Methodology for workflow publishing Repository of linked workflows: http://www.opmw.org/sparql http://purl.org/net/p-plan http://www.opmw.org/ontology/ Daniel Garijo and Yolanda Gil. 2011. A new approach for publishing workflows: abstractions, standards, and linked data. (WORKS '11). ACM, New York, NY, USA, 47-56. Daniel Garijo and Yolanda Gil. Augmenting PROV with Plans in P-PLAN: Scientific Processes as Linked Data. In Proceedings of the 2nd International Workshop on Linked Science 2012, Boston, 2012.
  • 7. Definition of workflow abstractions 7 Catalog of common independent workflow abstractions (motifs) Data-oriented motifs: What kind of manipulations does the workflow have? Workflow-oriented motifs: How does the workflow perform its operations Analysis from 260 different workflows from 10 domains analyzed belonging to 5 different workflow systems http://purl.org/net/wf-motifs# Daniel Garijo, Pinar Alper, Khalid Belhajjame, Oscar Corcho, Yolanda Gil, Carole Goble, Common motifs in scientific workflows: An empirical analysis, Future Generation Computer Systems, Volume 36, July 2014, Pages 338-351
  • 8. Finding and evaluating common abstractions 8 https://github.com/dgarijo/FragFlow http://purl.org/net/wf-fd Graph mining techniques Workflow fragment representation and linkage Workflow fragment Filtering techniques Daniel Garijo, Oscar Corcho, Yolanda Gil, Boris A.Gutman,Ivo D. Dinov, Paul Thompson, and Arthur W. Toga. FragFlow: Automated Fragment Detection in Scientific Workflows. In The 10th IEEE International Conference on e-Science, Guaruja, 2014
  • 9. Evaluation and results 9 Scientific workflow repositories can be mined automatically to extract reusable patterns and abstractions that are useful for workflow developers aiming to reuse existing workflows. •Evaluation 1: Comparison against what users defined in the corpus •Are our patterns similar to what you identified as a useful pattern? •When playing with the pattern frequency, up to 75% of the detected patterns are the same as the ones defined by users. •Evaluation 2: User survey •From those patterns we found disjoint with the user defined ones, are they useful? •66%-100% of the proposed patterns were considered useful •Survey on three corpora.
  • 10. Summary 10 •Workflow representation •Models based on standards for representing workflow provenance and workflow templates •Adapted a common used methodology for publishing workflows as web objects. •Workflow abstraction •Defined a catalog of common domain independent abstractions, based on their functionality. •Provided an ontology for semi-automatic annotation. •Workflow reuse •Automatic detection and annotation of common useful patterns given a workflow corpora. •Models to relate how patterns link and relate different workflows on a workflow corpus.
  • 11. 11 Collaborators and co-authors •Daniel Garijo, Oscar Corcho Ontology Engineering Group, UPM •Yolanda Gil Information Sciences Institute, USC •Boris A. Gutman, Ivo D. Dinov, Paul ThompsonArthur W. Toga, Meredith N. Braskie, Derrek Hibar, Xue Hua, Neda Jahanshad. USC Laboratory of Neuro Imaging IEEE eScience 2014. Guarujá, Brasil •Pinar Alper, Khalid Belhajjame, Carole Goble

Editor's Notes

  • #3: Explain the context: what are scientific workflows and their benefits