SlideShare a Scribd company logo
Trends in Use of Scientific Workflows:




                                                            DataONE
Insights from a Public Repository and
Recommendations for Best Practices
Richard Littauer, Karthik Ram, Bertram Ludäscher, William
Michener, Rebecca Koskela

                                                             1
Scientific Workflows
Tools that help scientists:

  • Automate repetitive or
    difficult work




                                       DataONE
  • Provide reproducibility to their
    experiments

  • Track provenance

  • Share their data with other         2
    scientists
Workflow Workbenches




                       DataONE
                        3
Workflow Workbenches
These facilitate:
  • Creation

  • Mapping




                       DataONE
  • Scheduling

  • Execution

  • Visualization

  • Re-Use              4
Example Workflow




                                                 DataONE
                                                  5


http://www.myexperiment.org/workflows/140.html
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                                                   6


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                  DataONE
                                                • How are they being shared?




                                                                                   7


http://www.flickr.com/photos/eleaf/2536358399
Our Study


                                                • How are workflows being used?




                                                                                    DataONE
                                                • How are they being shared?
                                                • What sort of best practices can
                                                  researchers follow to maximize
                                                  the longevity and use of their
                                                  work?

                                                                                     8


http://www.flickr.com/photos/eleaf/2536358399
Our Study

• www.myexperiment.org
  • Est. 2007




                                                            DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)




                                                             9
Our Study

• www.myexperiment.org
  • Est. 2007




                                                                      DataONE
  • 5000+ users
  • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner)

  • Minable RDF storage for workflows, groups, packs, users, files.
  • Minable data gathered through the SCUFLE XML language for the
    Taverna workflows
  • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows.
                                                                      10
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
                                                                    11
Our Study

• We harvested information using a combination of SPARQL and
  Python (https://github.com/RichardLitt/Understanding-Workflows)




                                                                    DataONE
• Gathered user, workflow, files, packs, groups view and
  download statistics, metadata, descriptions, tags, and so on
  (http://thedatahub.org/dataset/myexperiment-screenscrape)




                                                                    12
Findings
           • A large percentage of
             workflows consist of few
             components.

           • The amount of components




                                         DataONE
             ranges from 1 to 250. The
             average workflow supports
             24.3 tasks.

           • Complex workflows are
             downloaded more.
                                         13
Findings
           • Most workflow contributors
             submit a single workflow.

           • Only 13 users have uploaded
             more than 30 workflows.




                                            DataONE
           • Just over 5% of the users on
             myExperiment have uploaded
             workflows.



                                            14
Findings
           • Most workflows have only
             one version uploaded.

           • When several versions do




                                           DataONE
             exist, the workflow is more
             frequently downloaded than
             “single-edition” workflows.




                                           15
Findings
           • Workflow use declined
             significantly a month after
             initial upload.




                                           DataONE
                                           16
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.




                                                              17
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.




                                                              18
Findings

• A large percentage of workflow components – approx. 38% -
  are shims.




                                                              DataONE
  • Components that are used to make output from one step
    conform to the format expected by a subsequent step.

  • This is a problem for developers.

  • 8% more than previous studies (Lin et al.)

                                                              19
Findings

• 60% of workflows have embedded workflows within them.




                                                          DataONE
                                                          20
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…




                                                               21
Findings

• 60% of workflows have embedded workflows within them.

• Documentation on site (tags, description) does not improve




                                                               DataONE
  use…

• … but community engagement does.




                                                               22
Recommendations


Remember workflows are evolving entities.




                                            DataONE
They are updated in response to user
feedback, engagement, and improvements in
methodology.

                                            23
Recommendations


Use relevant social annotation tools.




                                                DataONE
But they need to be constrained; for
instance, through the use of a controlled tag
vocabulary.

                                                24
Recommendations


Talk about them.




                                     DataONE
Cite the workflow in publications.
Share with colleagues
Advertise the workflow.

                                     25
Recommendations


Provide sufficient descriptions of your workflows.




                                                     DataONE
                                                     26
Recommendations


Keep in mind that one size does not fit all.




                                               DataONE
                                               27
Recommendations


Workflow re-use could benefit significantly from




                                                     DataONE
the assignment of stable identifiers, like Digital
Object Identifiers (DOI).




                                                     28
Recommendations


Education is the key to more use.




                                                   DataONE
i.e. in professional society meetings, online
courses, and undergraduate and graduate courses.



                                                   29
Impact on Science
Following these recommendations can help:
• Make science more efficient.
• Facilitate reproducible science.




                                                   DataONE
• Help with collaborative research.
• Speed up the peer review process.
• Your impact. (For instance, NSF has said these
  are valuable contributions.)

                                                   30
Links
• Mendeley Research Group:
  http://www.mendeley.com/groups/1189721/scientific-workflows-
  and-workflow-systems/
• Github https://github.com/RichardLitt/Understanding-Workflows
• Data http://thedatahub.org/dataset/myexperiment-screenscrape




                                                                  DataONE
• Notebook https://notebooks.dataone.org/workflows




                                                                  31


http://www.flickr.com/photos/wwworks/4759535950/

More Related Content

PPTX
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
PPTX
Towards Open Methods: Using Scientific Workflows in Linguistics
PPTX
Scientific Workflows Systems :In Drug discovery informatics
PPTX
WORKS 11 Presentation
PDF
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
PDF
Workflows and Plone
 - Case supervisiondoc
PDF
Online Workflow Management and Performance Analysis with Stampede
PDF
Workware systems company presentation web aug 11
Workflow Classification and Open-Sourcing Methods: Towards a New Publication ...
Towards Open Methods: Using Scientific Workflows in Linguistics
Scientific Workflows Systems :In Drug discovery informatics
WORKS 11 Presentation
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
Workflows and Plone
 - Case supervisiondoc
Online Workflow Management and Performance Analysis with Stampede
Workware systems company presentation web aug 11

Similar to Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices (20)

PPT
IMPACT Final Conference - Clemens Neudecker
PDF
EclipseConEurope2012 SOA - Models As Operational Documentation
PDF
2012 03-28 Wf4ever, preserving workflows as digital research objects
PDF
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
PDF
OAI7 Research Objects
PDF
Research Objects in Wf4Ever
PDF
S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestratio...
PDF
Curating and Preserving Collaborative Digital Experiments
PPT
10 Best Practices for Workflow Design
PPTX
Detecting common scientific workflow fragments using templates and execution ...
PPTX
Advances in Scientific Workflow Environments
PDF
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
PPT
Session 46 - Principles of workflow management and execution
PPTX
Ogce Workflow Suite Tg09
PDF
Let’s talk about reproducible data analysis
PDF
New York City and Baltimore Semantic Web Meetups 20130221/20120226
ODP
Zen and Enterprise Architecture
PPT
eResearch workflows for studying free and open source software development
PPTX
Versioning for Workflow Evolution
PDF
From Workflows to Transparent Research Objects and Reproducible Science Tales
IMPACT Final Conference - Clemens Neudecker
EclipseConEurope2012 SOA - Models As Operational Documentation
2012 03-28 Wf4ever, preserving workflows as digital research objects
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
OAI7 Research Objects
Research Objects in Wf4Ever
S-CUBE LP: Data Dependency: Inferring Data Attributes in Service Orchestratio...
Curating and Preserving Collaborative Digital Experiments
10 Best Practices for Workflow Design
Detecting common scientific workflow fragments using templates and execution ...
Advances in Scientific Workflow Environments
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
Session 46 - Principles of workflow management and execution
Ogce Workflow Suite Tg09
Let’s talk about reproducible data analysis
New York City and Baltimore Semantic Web Meetups 20130221/20120226
Zen and Enterprise Architecture
eResearch workflows for studying free and open source software development
Versioning for Workflow Evolution
From Workflows to Transparent Research Objects and Reproducible Science Tales
Ad

More from Richard Littauer (12)

PPT
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
PPTX
Named Entity Recognition - ACL 2011 Presentation
PPTX
Marcu 2000 presentation
PPTX
Barzilay & Lapata 2008 presentation
PPTX
Saarland and UdS
PPTX
Building Corpora from Social Media
PPTX
Visualising Typological Relationships: Plotting WALS with Heat Maps
PPTX
On Tocharian Exceptionality to the centum/satem Isogloss
PDF
The Evolution of Morphological Agreement
PPT
Evolution of Morphological Agreement - Peche Kucha
PPT
The Evolution of Speech Segmentation: A Computer Simulation
PPTX
A Reanalysis of Anatomical Changes for Language
Academic Research in the Blogosphere: Adapting to New Risks and Opportunities...
Named Entity Recognition - ACL 2011 Presentation
Marcu 2000 presentation
Barzilay & Lapata 2008 presentation
Saarland and UdS
Building Corpora from Social Media
Visualising Typological Relationships: Plotting WALS with Heat Maps
On Tocharian Exceptionality to the centum/satem Isogloss
The Evolution of Morphological Agreement
Evolution of Morphological Agreement - Peche Kucha
The Evolution of Speech Segmentation: A Computer Simulation
A Reanalysis of Anatomical Changes for Language
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Weekly Chronicles - August'25 Week I
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
20250228 LYD VKU AI Blended-Learning.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Trends in Use of Scientific Workflows: Insights from a Public Repository and Recommendations for Best Practices

  • 1. Trends in Use of Scientific Workflows: DataONE Insights from a Public Repository and Recommendations for Best Practices Richard Littauer, Karthik Ram, Bertram Ludäscher, William Michener, Rebecca Koskela 1
  • 2. Scientific Workflows Tools that help scientists: • Automate repetitive or difficult work DataONE • Provide reproducibility to their experiments • Track provenance • Share their data with other 2 scientists
  • 4. Workflow Workbenches These facilitate: • Creation • Mapping DataONE • Scheduling • Execution • Visualization • Re-Use 4
  • 5. Example Workflow DataONE 5 http://www.myexperiment.org/workflows/140.html
  • 6. Our Study • How are workflows being used? DataONE 6 http://www.flickr.com/photos/eleaf/2536358399
  • 7. Our Study • How are workflows being used? DataONE • How are they being shared? 7 http://www.flickr.com/photos/eleaf/2536358399
  • 8. Our Study • How are workflows being used? DataONE • How are they being shared? • What sort of best practices can researchers follow to maximize the longevity and use of their work? 8 http://www.flickr.com/photos/eleaf/2536358399
  • 9. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) 9
  • 10. Our Study • www.myexperiment.org • Est. 2007 DataONE • 5000+ users • 2000+ workflows (mostly Taverna 1, 2, and RapidMiner) • Minable RDF storage for workflows, groups, packs, users, files. • Minable data gathered through the SCUFLE XML language for the Taverna workflows • Taverna 1 - 479 workflows; Taverna 2 - 684 workflows. 10
  • 11. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE 11
  • 12. Our Study • We harvested information using a combination of SPARQL and Python (https://github.com/RichardLitt/Understanding-Workflows) DataONE • Gathered user, workflow, files, packs, groups view and download statistics, metadata, descriptions, tags, and so on (http://thedatahub.org/dataset/myexperiment-screenscrape) 12
  • 13. Findings • A large percentage of workflows consist of few components. • The amount of components DataONE ranges from 1 to 250. The average workflow supports 24.3 tasks. • Complex workflows are downloaded more. 13
  • 14. Findings • Most workflow contributors submit a single workflow. • Only 13 users have uploaded more than 30 workflows. DataONE • Just over 5% of the users on myExperiment have uploaded workflows. 14
  • 15. Findings • Most workflows have only one version uploaded. • When several versions do DataONE exist, the workflow is more frequently downloaded than “single-edition” workflows. 15
  • 16. Findings • Workflow use declined significantly a month after initial upload. DataONE 16
  • 17. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. 17
  • 18. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. 18
  • 19. Findings • A large percentage of workflow components – approx. 38% - are shims. DataONE • Components that are used to make output from one step conform to the format expected by a subsequent step. • This is a problem for developers. • 8% more than previous studies (Lin et al.) 19
  • 20. Findings • 60% of workflows have embedded workflows within them. DataONE 20
  • 21. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… 21
  • 22. Findings • 60% of workflows have embedded workflows within them. • Documentation on site (tags, description) does not improve DataONE use… • … but community engagement does. 22
  • 23. Recommendations Remember workflows are evolving entities. DataONE They are updated in response to user feedback, engagement, and improvements in methodology. 23
  • 24. Recommendations Use relevant social annotation tools. DataONE But they need to be constrained; for instance, through the use of a controlled tag vocabulary. 24
  • 25. Recommendations Talk about them. DataONE Cite the workflow in publications. Share with colleagues Advertise the workflow. 25
  • 26. Recommendations Provide sufficient descriptions of your workflows. DataONE 26
  • 27. Recommendations Keep in mind that one size does not fit all. DataONE 27
  • 28. Recommendations Workflow re-use could benefit significantly from DataONE the assignment of stable identifiers, like Digital Object Identifiers (DOI). 28
  • 29. Recommendations Education is the key to more use. DataONE i.e. in professional society meetings, online courses, and undergraduate and graduate courses. 29
  • 30. Impact on Science Following these recommendations can help: • Make science more efficient. • Facilitate reproducible science. DataONE • Help with collaborative research. • Speed up the peer review process. • Your impact. (For instance, NSF has said these are valuable contributions.) 30
  • 31. Links • Mendeley Research Group: http://www.mendeley.com/groups/1189721/scientific-workflows- and-workflow-systems/ • Github https://github.com/RichardLitt/Understanding-Workflows • Data http://thedatahub.org/dataset/myexperiment-screenscrape DataONE • Notebook https://notebooks.dataone.org/workflows 31 http://www.flickr.com/photos/wwworks/4759535950/