SlideShare a Scribd company logo
Grant agreement no.: 27092




          Workflows Preservation!
José Enrique Ruiz, Lourdes Verdes-Montenegro, Susana Sánchez, !
       Juan de Dios Santander-Vela and the Wf4Ever Team !
                           IAA-CSIC!
                                      !
                              January 18th 2012!
              7th Workflow Working Group Meeting - AS OV France!
Who am I ?!

Instituto Astrofísica de Andalucia - CSIC!




                                                       2
AMIGA Group!

Analysis of the interstellar Medium of Isolated Galaxies!
!
           Statistical baseline of isolated galaxies to compare!
    !
        with the behaviour of galaxies in denser environments!

                    Multi   study of ~1000 galaxies!
                                +!
         Need of intensive and complex analysis of 3D data!
                      2D spatial + 1 Velocity!

                                                           IAA-CSIC!
                    Uuiv . Granada, Obs. Marseille, Obs. Paris, NAOJ, !
                     FCRAO, UNAM, Univ. Edinburgh, IRAM, ESO,!
                                    Kapteyn Astronomical Institute.!
                                                                       !
                                     P.I. Lourdes Verdes-Montenegro!
                                                  http://amiga.iaa.es!
                                                                           3
What is Wf4Ever ?!

EU funded FP7 STREP Project!
December 2010 – December 2013 !
                      1.  Intelligent Software Components (ISOCO, Spain)!
                      2.  University of Manchester (UNIMAN, UK)!
   2     7            3.  Universidad Politécnica de Madrid (UPM, Spain)!
    5!       4!       4.  Poznan Supercomputing and Networking Centre
                          (PSNC, Poland)!
                      5.  Universisty of Oxford (OXF, UK)!
                      6.  Instituto de Astrofísica de Andalucía (IAA, Spain)!
 1! 3!
                      7.  Leiden University Medical Centre (LUMC, NL)!
  6!




                                                                                4
What is Wf4Ever ?!

Technological infrastructure for the preservation and efficient retrieval and reuse
                of scientific workflows in a range of disciplines!

Partners!
•  One SME!
                                           Goals!
•  Six public organizations!                !
                                            Archival, classification, and indexing
Technological Core Competencies!            of scientific workflows and their
                                            associated materials in scalable
•    Digital Libraries!
•    Workflow Management !
                                            semantic repositories, providing
                                            advanced access and recommendation
•    Semantic Web!
•    Integrity & Authenticity!
                                            capabilities!
•    Provenance!                            !
•    Information Quality!                   Creation of scientific communities to
                                            collaboratively share, reuse and evolve
Case Studies!                               workflows and their parts, stimulating
                                            the development of new scientific
•  Astronomy (IAA)!
                                            knowledge!
•  Genome-wide Analysis and
   Biobanking (LUMC)!
                                            !
                                                                                      5
What are our Scientific workflows ?!

Combination of data and processes into a configurable
and structured set of steps that implement semi-
automated computational solutions in problem solving!

Types of workflows in Astronomy!

•  Personal script-based recipes !
   Python, IDL, Software..!
•  Multi-archive VO recipes!
•  Internal group developments !
   GRID, Clusters..!
•  Processing pipelines!
   Provide Data, Computing Infrastructure, Tools..!


Scientifically exploitable results vs. scientific insight !    Wfs on
Easily accessible and reproducible (Shared)!                  steroids !
                                                                           6
Why workflow preservation is important ?!

!         Astronomy research is entirely digital!
!         Time has come to go “Beyond the PDF” !
!
Preserved experiments!
•    Methodology “in action”!           Discoverable !!
•    All data are exposed!
•    Reproducible!
•    Repeatable! Trust assessment
•    Re-usable!
•    Re-purposeable!
•    Participatory!
•    Collaborative!
•    Formative!        Social aspect

                                                               7
Related Initiatives!

Cyber-SKA!
Provide infrastructure that will be required to address the needs of
future radio telescopes such as the Square Kilometre Array!
!
Web based workflow builder !
•  Image segmentation!
•  Image mosaicking (Montage)!
•  Spatial reprojection!
•  Plane extraction from data cubes!


IceCore!
University of Helsinki!
Web portal for executing workflows – University of Helsinki!
Common interface for Wfs distributed in different engine servers!
                                                                       8
!
Related Initiatives!
Montage!
•  FITS Image Mosaicking!
•  Toolkit for Desktops, Clusters and Grids!
!
Astro-WISE!
•  Distributed data storage and computing infrastructure!
•  Track process provenance of final data products!
•  Calibration and analysis of images!
!
Helio-VO!
•  Solar physics Virtual Observatory!
•  Enable workflow execution via Taverna Server!
!
Workflows VO France!
•  Provide use cases mainly oriented VO !
•  AÏDA Workflow System implements FITS validation with CharDM !
                                                                      9
Tools!

Taverna!
    •    Strongly typed bioinformatics!
    •    Taverna Engine!
    •    Taverna Server!
    •    Taverna Workbench!
Kepler!
    •  Generic Science!
    •  Workflow System!
Triana!
    •    Local execution!
    •    Clusters RMI!
    •    GRID!
    •    Web Services!
                                              10
!
Tools!

Aladin JLOW Plugin!
Aladin plugin API permits graphical replacement of Aladin tools!




                                                                   11
Tools!

Aladin JLOW Plugin!




                          12
Tools!

ESO Reflex!
Finland’s in-kind contribution to ESO!
   •  Prototype/feasibility study!
   •  Initially based on Taverna 1!
Current implementation based on Kepler!
!
AstroTaverna!
AstroGrid Development!
Prototype, marrying of VO Desktop & Taverna 1!
Library of Taverna functions to access VO Desktop’s API!
!
Status!
Wrapper libraries only for Taverna 1!
                                                               13
Digital Repositories!


The recipes store!
Oxford e-Research Centre!
!
•    Find workflows!
•    Share workflows and files!
•    Find people!
•    Build communities!
•    Publish packages!
•    Tag workflows!
•    Score and rate workflows!
•    Comment on workflows!
•    Write reviews!


                                                     14
Digital Repositories!

 !!
!!

      Astronomy in MyExperiment!
      •  10 interested users !
      •  No VO-services-based Wfs!
      •  Some Helio Project Wfs!
           •  VOTables parsing!
           •  Internal services!
           •  Astro-Shims !
      •  BioCatalogue vs. VORegistry !
      !
      Astro-Wf4Ever specific Wfs!
      •  Catalogue Queries!
      !
      !
                                         15
The upcoming context!


    Processes should benefit of the same privileges acquired by Data!


    Digital Libraries of Workflows may boost the use of the existing
                       infrastructure of data (VO)!

Users need templates !!
!
Wf4Ever is also a project about!
  •  How to publish!
  •  How to do review by peers!
  •  Improve visibility by reference and attribution!
!
Publishers should play an import role!                                  16
The upcoming context!


The next generation of archives!
!
Much wider FoV and spectral coverage!
•  Huge sized datasets (~ tens TB)!
•  Big Data science highly dependent on I/O data rates!
•  Subproducts as virtual data generated on-the-fly!

Automated surveys!
•  Huge amount of tabular data!
•  Services for Knowledge Discovery in Databases!
!

                                                          17
The upcoming context!


We are moving into a world where !
•  computing and storage are cheap !
•  data movement is death!

Archives should evolve from data providers into virtual data
and services providers, where web services may help to solve
bandwidth issues.!
!
Archives speaking self-descriptif web services!
•  Smaller virtual data subproducts!
•  Distributed, multi-archive, multi-wavelength astronomy!


                                                               18
Considerations!

(Data)   Workflow preservation!
!
•  Interpreted through their execution!
    •  Complex models are required to describe them!
•  Severely vulnerable to obsolescence !
    •  Applications !
    •  Libraries!
    •  Operating environment!
•  Provenance is a complex issue in a cloud of services!
•  Resources are often beyond control of scientists!
•  Alleviate decay of external resources via alternates!


                                                           19
Considerations!

(Data)    Workflow preservation!
!
•    Versioning of the whole or its components !
•    Restricted access on data and processes!
•    Permissions, licenses, platform, costs, etc.!
•    Semantic discovery of Wfs, processes, web services!
•    Metrics for quality: use stats, logs uptime, etc.!
•    Integrity evaluation!
•    Completeness checking!
•    Ensure trustworthiness and authenticity!
•    Workflows for workflow curation!

                                                            20
A first approach in Workflow Preservation!

Preserve, Retrieve, Reconstruct, Replay!
!
•  Retrieve!                                Characterization!
   •  Functionality of the Wf or its modules!
   •  What are the inputs and outputs!
   •  Metadata, authority, keywords!        Semantics and
•  Reconstruct!                                Modeling!
   •  Understand dependencies and components!
   •  Technical specificities!
•  Replay!                                  Execution Tools!
   •  Check the success of the preservation method!
!
•  Referenced and acknowledged!             Long-term IDs!
                                                             21
!
Wf4Ever Update!

RO. The Research Object!
!
All components related to the research lifecycle of an
experiment should be available. !
!
Preserved and easily retrievable !
!
•    Proposals!
•    Data!
•    Processes!
•    Publications!
!
                                                         22
Astronomy WP in Wf4Ever!

Development and Implementation of Golden Exemplars!
    •  Local catalogue curation based on VO Archives!
    •  Sources extraction and crossmatching from 2D images!
    •  Modeling and analysis of 3D velocity cubes of galaxies!
    !
Create a community of users!
    •  Development of Prototypes and Tools!
    •  Dissemination!
     !
Integrate existing astronomy software with Wf4Ever Tools!
   •  SAMP and WebSAMP!
!
Provide interoperable models, ontologies and vocabularies for the
characterization of workflows, processes and RO components !
!
                                                                 23
Astronomy WP in Wf4Ever!
!
•  !   Characterization of the Astronomy domain in Wf!
• !    Detailed study of standards and web services in IVOA!
•      Exploration of similar initiatives for the curation of digital objects !
•      Sociological study and working methodology of astronomers!
•      Extraction of user and technical requirements!
•      Extraction of Taverna user requirements for Astronomy!
•      Implementation of first Golden Exemplar!
•      Early contacts in IVOA for the creation of a community of users!




                                                                                  24
Wf4Ever Update!

!
Users’ Requirements!
•    Functional requirements for Wf4Ever “working” platform!
•    Focused on improving collaboration and reuse!
•    Interoperability in exchanging scientific methodology!
•    Expose experiment in a structured way to be understood by others!
!
RO Modeling!
•  Model for interlinked components in a Research Object!
•  Strategies for assessing integrity and authenticity!
•  Attempts in metrics for Information Quality!
!
!
     We need to build what we would like to preserve!
                                                                     25
Wf4Ever Update!




             26
Wf4Ever Update!
         !
!!
! ! Proposed   improvements for Taverna!
     !
     •  VO Registry Access perspective!
     •  STILTS VOTable Library Integration!
     •  SAMP (Connectivity with VO Software)!
     •  Python based Beanshells - Done!
     •  Simple standard functions for Astronomy!
     •  ODBC Connector to DB!
     !
     !
     !
     !



                                                            27
Wf4Ever Update!



Architecture!
•  Search & Retrieval Service!
•  Recommender Service!
•  I&A Evaluation Service!
•  Notification Service!
!
!
!
User-Tools Prototypes!
•  RO Command Line Tool!
•  RO Annotator!
•  RO Box!
                            28
Wf4Ever Update!

ROBox!
!
Seamless contribution to a
working collaborative platform!
!
A shared folder in Dropbox
becomes a Working RO!
!
!
!
!
!
!
Automatic generation of
metadata !
                             29
Wf4Ever Update!

RO Digital Library and RO Import!




                                                 30
Wf4Ever Update!

    !
!




        •    Anatomy of a Research Object!
        •    Annotations on RO components!
        •    RO Graphical Representation!
        •    Data/Sessions Inspection (SAMP)!


                                                             31
Wf4Ever Update!

    !
!




                     32
Wf4Ever Update!

  !
RO Visualization!
!




                                 33
Wf4Ever Update!




Keywords    [galaxies][catalogs]
Integrity
 Rating

Downloads 36
Citations   [2]
Re-used     [1]
Comments    [4]

[Previous version | Next version]


                                       34
Wf4Ever Update!

    !
!




            Keywords       [galaxies][catalogs]
         Integrity
         Rating

               Downloads 36


        Re-used      [1]
        Comments     [4]

        [Previous version | Next version]


                                                   35
Wf4Ever Update!


Notification Service for Authors!
What should be notified ?!

    •    Fails!
    •    Downloads!
    •    Annotations!
    •    Linked/Similarity!
    •    Modifications on Working RO!
    •    Acknowledgements!
!
Notification Management Tool!
Avoid spam!


                                                     36
Astronomy WP in Wf4Ever!

Astronomy WP!
•  Development and Implementation of “Extraction of Sources”!
•  Development and Implementation of “Modelling of 3D Data”!
•  Explore experiments subject to be migrated to Wf/RO methodology!
•  Contribute to IVOA in Semantics for Processes!
!
Other WPs!
Continue Providing Feedback!
•  RO Model, Architecture, Integrity & Authenticity, Information Quality, etc. !
•  Software integration and improved functionalities (SAMP, Taverna, etc.)!
•  Prototypes for management and visualization of RO!
!
Community engagement!
•  Approach Astro-Informaticians!
•  Continue pushing in the IVOA Community!
•  Tackle collaboration with Publishers!
!
!                                                                              37
Workflows & IVOA!

Distributed data analysis in the VO!
•  Panchromatic, multi-archive, multi-facility!
•  Executes in the VO Infrastructure!
•  Orchestration of simple services!
!
                                      Workflows VO Characterization!
Present processing pipelines!         •  Inputs!
•  Produce exploitable data!          •  Outputs!
•  Provenance modeling!               •  Processes!
                                      •  Descriptions!
•  VO compliant data !                •  Metadata!
!                                     •  Etc..!
Data processing from the VO!
•  Provide custom re-processing to VO users!
•  Virtual data generation through UWS in VOSpace!
                                                                       38
Related activities in the VO!

IVOA Working Groups!
•  Data Modeling!
   Characterization, Provenance..!
•  Semantics!
   Ontologies, Vocabularies for Processes!
•  Data Access Layer!
   TAP, self-descriptive Protocols..!                       !
•  Grid and Web Services!                               IVOA Note!
   UWS, VOSpace, SSO..!                         Scientific Workflows in the VO!
•  Applications!                              André Schaaff & Jose Enrique Ruiz!
   SAMP!                                                       !
•  IG. KDD!                                         workflow@ivoa.net!
   Knowledge Discovery and Data Mining!
•  IG. Data Curation and Preservation!
   Persistent Identifiers, Curation of VO Resources..!
   Wf4Ever Project, US VAO semantic linking of proposals, publications, data!
                                                                                39
Questions!

!
More info!
http://amiga.iaa.es/p/212-workflows.htm!
http://www.wf4ever-project.org!
workflow@ivoa.net!
!
!




                                                   40

More Related Content

PDF
Curating and Preserving Collaborative Digital Experiments
PDF
Web services based workflows to deal with 3D data
PDF
Collaborative Digital Experiments
PDF
Workflow Preservation
PPTX
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
PPTX
Being Reproducible: SSBSS Summer School 2017
PDF
A Clean Slate?
PDF
Implementing a VO archive for datacubes of galaxies
Curating and Preserving Collaborative Digital Experiments
Web services based workflows to deal with 3D data
Collaborative Digital Experiments
Workflow Preservation
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Being Reproducible: SSBSS Summer School 2017
A Clean Slate?
Implementing a VO archive for datacubes of galaxies

Similar to Wf4Ever: Workflow Preservation (20)

PDF
Workflows in the Virtual Observatory
PDF
VO web-services-based astronomy workflows
KEY
Wf4Ever: Work!ows for Methodology and Science Preservation
PDF
Research Objects in Wf4Ever
PDF
VO Course 12: Workflows & the Wf4Ever project
PDF
OAI7 Research Objects
PPTX
Deroure Repo3
PPTX
Deroure Repo3
PDF
Roberts leiden110213
PDF
Oak meeting 18/09/2014
PPT
Knowledge Infrastructure for Global Systems Science
PDF
OeRC Seminar
PDF
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
PPT
Where are we going and how are we going to get there?
PPTX
myExperiment and the Rise of Social Machines
PDF
ApacheCon NA 2013 VFASTR
PDF
2012 03-28 Wf4ever, preserving workflows as digital research objects
PPTX
OER for repository managers
PDF
Datos enlazados BNE and MARiMbA
PPTX
2012 09 aos-workshop-johanneskeizer
Workflows in the Virtual Observatory
VO web-services-based astronomy workflows
Wf4Ever: Work!ows for Methodology and Science Preservation
Research Objects in Wf4Ever
VO Course 12: Workflows & the Wf4Ever project
OAI7 Research Objects
Deroure Repo3
Deroure Repo3
Roberts leiden110213
Oak meeting 18/09/2014
Knowledge Infrastructure for Global Systems Science
OeRC Seminar
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Where are we going and how are we going to get there?
myExperiment and the Rise of Social Machines
ApacheCon NA 2013 VFASTR
2012 03-28 Wf4ever, preserving workflows as digital research objects
OER for repository managers
Datos enlazados BNE and MARiMbA
2012 09 aos-workshop-johanneskeizer
Ad

More from Jose Enrique Ruiz (15)

PDF
Jupyter notebooks on steroids
PDF
IPython Notebooks - Hacia los papers ejecutables
PDF
Velocity cubes of galaxies
PDF
Open Science and Executable Papers
PDF
Digital Science: Towards the executable paper
PDF
Digital Science: Reproducibility and Visibility in Astronomy
PDF
Workflows to access and massage VOData
PDF
Curation and Characterization of Web Services
PDF
Digital Science
PDF
Use of CharDM in an archive of velocity cubes
PDF
SVO Activities - SEA 2008
PDF
El Observatorio Virtual - eCA
PDF
Multidimensional Data in the VO
PDF
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
PDF
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Jupyter notebooks on steroids
IPython Notebooks - Hacia los papers ejecutables
Velocity cubes of galaxies
Open Science and Executable Papers
Digital Science: Towards the executable paper
Digital Science: Reproducibility and Visibility in Astronomy
Workflows to access and massage VOData
Curation and Characterization of Web Services
Digital Science
Use of CharDM in an archive of velocity cubes
SVO Activities - SEA 2008
El Observatorio Virtual - eCA
Multidimensional Data in the VO
B0DEGA 3D VO Archive - IVOA 2010 Fall Interop
Wf4Ever: Advanced Workflow Preservation Technologies for Enhanced Science i
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Encapsulation theory and applications.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Per capita expenditure prediction using model stacking based on satellite ima...
Encapsulation theory and applications.pdf
NewMind AI Weekly Chronicles - August'25 Week I
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Monthly Chronicles - July 2025
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
“AI and Expert System Decision Support & Business Intelligence Systems”
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm

Wf4Ever: Workflow Preservation

  • 1. Grant agreement no.: 27092 Workflows Preservation! José Enrique Ruiz, Lourdes Verdes-Montenegro, Susana Sánchez, ! Juan de Dios Santander-Vela and the Wf4Ever Team ! IAA-CSIC! ! January 18th 2012! 7th Workflow Working Group Meeting - AS OV France!
  • 2. Who am I ?! Instituto Astrofísica de Andalucia - CSIC! 2
  • 3. AMIGA Group! Analysis of the interstellar Medium of Isolated Galaxies! ! Statistical baseline of isolated galaxies to compare! ! with the behaviour of galaxies in denser environments! Multi study of ~1000 galaxies! +! Need of intensive and complex analysis of 3D data! 2D spatial + 1 Velocity! IAA-CSIC! Uuiv . Granada, Obs. Marseille, Obs. Paris, NAOJ, ! FCRAO, UNAM, Univ. Edinburgh, IRAM, ESO,! Kapteyn Astronomical Institute.! ! P.I. Lourdes Verdes-Montenegro! http://amiga.iaa.es! 3
  • 4. What is Wf4Ever ?! EU funded FP7 STREP Project! December 2010 – December 2013 ! 1.  Intelligent Software Components (ISOCO, Spain)! 2.  University of Manchester (UNIMAN, UK)! 2 7 3.  Universidad Politécnica de Madrid (UPM, Spain)! 5! 4! 4.  Poznan Supercomputing and Networking Centre (PSNC, Poland)! 5.  Universisty of Oxford (OXF, UK)! 6.  Instituto de Astrofísica de Andalucía (IAA, Spain)! 1! 3! 7.  Leiden University Medical Centre (LUMC, NL)! 6! 4
  • 5. What is Wf4Ever ?! Technological infrastructure for the preservation and efficient retrieval and reuse of scientific workflows in a range of disciplines! Partners! •  One SME! Goals! •  Six public organizations! ! Archival, classification, and indexing Technological Core Competencies! of scientific workflows and their associated materials in scalable •  Digital Libraries! •  Workflow Management ! semantic repositories, providing advanced access and recommendation •  Semantic Web! •  Integrity & Authenticity! capabilities! •  Provenance! ! •  Information Quality! Creation of scientific communities to collaboratively share, reuse and evolve Case Studies! workflows and their parts, stimulating the development of new scientific •  Astronomy (IAA)! knowledge! •  Genome-wide Analysis and Biobanking (LUMC)! ! 5
  • 6. What are our Scientific workflows ?! Combination of data and processes into a configurable and structured set of steps that implement semi- automated computational solutions in problem solving! Types of workflows in Astronomy! •  Personal script-based recipes ! Python, IDL, Software..! •  Multi-archive VO recipes! •  Internal group developments ! GRID, Clusters..! •  Processing pipelines! Provide Data, Computing Infrastructure, Tools..! Scientifically exploitable results vs. scientific insight ! Wfs on Easily accessible and reproducible (Shared)! steroids ! 6
  • 7. Why workflow preservation is important ?! ! Astronomy research is entirely digital! ! Time has come to go “Beyond the PDF” ! ! Preserved experiments! •  Methodology “in action”! Discoverable !! •  All data are exposed! •  Reproducible! •  Repeatable! Trust assessment •  Re-usable! •  Re-purposeable! •  Participatory! •  Collaborative! •  Formative! Social aspect 7
  • 8. Related Initiatives! Cyber-SKA! Provide infrastructure that will be required to address the needs of future radio telescopes such as the Square Kilometre Array! ! Web based workflow builder ! •  Image segmentation! •  Image mosaicking (Montage)! •  Spatial reprojection! •  Plane extraction from data cubes! IceCore! University of Helsinki! Web portal for executing workflows – University of Helsinki! Common interface for Wfs distributed in different engine servers! 8 !
  • 9. Related Initiatives! Montage! •  FITS Image Mosaicking! •  Toolkit for Desktops, Clusters and Grids! ! Astro-WISE! •  Distributed data storage and computing infrastructure! •  Track process provenance of final data products! •  Calibration and analysis of images! ! Helio-VO! •  Solar physics Virtual Observatory! •  Enable workflow execution via Taverna Server! ! Workflows VO France! •  Provide use cases mainly oriented VO ! •  AÏDA Workflow System implements FITS validation with CharDM ! 9
  • 10. Tools! Taverna! •  Strongly typed bioinformatics! •  Taverna Engine! •  Taverna Server! •  Taverna Workbench! Kepler! •  Generic Science! •  Workflow System! Triana! •  Local execution! •  Clusters RMI! •  GRID! •  Web Services! 10 !
  • 11. Tools! Aladin JLOW Plugin! Aladin plugin API permits graphical replacement of Aladin tools! 11
  • 13. Tools! ESO Reflex! Finland’s in-kind contribution to ESO! •  Prototype/feasibility study! •  Initially based on Taverna 1! Current implementation based on Kepler! ! AstroTaverna! AstroGrid Development! Prototype, marrying of VO Desktop & Taverna 1! Library of Taverna functions to access VO Desktop’s API! ! Status! Wrapper libraries only for Taverna 1! 13
  • 14. Digital Repositories! The recipes store! Oxford e-Research Centre! ! •  Find workflows! •  Share workflows and files! •  Find people! •  Build communities! •  Publish packages! •  Tag workflows! •  Score and rate workflows! •  Comment on workflows! •  Write reviews! 14
  • 15. Digital Repositories! !! !! Astronomy in MyExperiment! •  10 interested users ! •  No VO-services-based Wfs! •  Some Helio Project Wfs! •  VOTables parsing! •  Internal services! •  Astro-Shims ! •  BioCatalogue vs. VORegistry ! ! Astro-Wf4Ever specific Wfs! •  Catalogue Queries! ! ! 15
  • 16. The upcoming context! Processes should benefit of the same privileges acquired by Data! Digital Libraries of Workflows may boost the use of the existing infrastructure of data (VO)! Users need templates !! ! Wf4Ever is also a project about! •  How to publish! •  How to do review by peers! •  Improve visibility by reference and attribution! ! Publishers should play an import role! 16
  • 17. The upcoming context! The next generation of archives! ! Much wider FoV and spectral coverage! •  Huge sized datasets (~ tens TB)! •  Big Data science highly dependent on I/O data rates! •  Subproducts as virtual data generated on-the-fly! Automated surveys! •  Huge amount of tabular data! •  Services for Knowledge Discovery in Databases! ! 17
  • 18. The upcoming context! We are moving into a world where ! •  computing and storage are cheap ! •  data movement is death! Archives should evolve from data providers into virtual data and services providers, where web services may help to solve bandwidth issues.! ! Archives speaking self-descriptif web services! •  Smaller virtual data subproducts! •  Distributed, multi-archive, multi-wavelength astronomy! 18
  • 19. Considerations! (Data) Workflow preservation! ! •  Interpreted through their execution! •  Complex models are required to describe them! •  Severely vulnerable to obsolescence ! •  Applications ! •  Libraries! •  Operating environment! •  Provenance is a complex issue in a cloud of services! •  Resources are often beyond control of scientists! •  Alleviate decay of external resources via alternates! 19
  • 20. Considerations! (Data) Workflow preservation! ! •  Versioning of the whole or its components ! •  Restricted access on data and processes! •  Permissions, licenses, platform, costs, etc.! •  Semantic discovery of Wfs, processes, web services! •  Metrics for quality: use stats, logs uptime, etc.! •  Integrity evaluation! •  Completeness checking! •  Ensure trustworthiness and authenticity! •  Workflows for workflow curation! 20
  • 21. A first approach in Workflow Preservation! Preserve, Retrieve, Reconstruct, Replay! ! •  Retrieve! Characterization! •  Functionality of the Wf or its modules! •  What are the inputs and outputs! •  Metadata, authority, keywords! Semantics and •  Reconstruct! Modeling! •  Understand dependencies and components! •  Technical specificities! •  Replay! Execution Tools! •  Check the success of the preservation method! ! •  Referenced and acknowledged! Long-term IDs! 21 !
  • 22. Wf4Ever Update! RO. The Research Object! ! All components related to the research lifecycle of an experiment should be available. ! ! Preserved and easily retrievable ! ! •  Proposals! •  Data! •  Processes! •  Publications! ! 22
  • 23. Astronomy WP in Wf4Ever! Development and Implementation of Golden Exemplars! •  Local catalogue curation based on VO Archives! •  Sources extraction and crossmatching from 2D images! •  Modeling and analysis of 3D velocity cubes of galaxies! ! Create a community of users! •  Development of Prototypes and Tools! •  Dissemination! ! Integrate existing astronomy software with Wf4Ever Tools! •  SAMP and WebSAMP! ! Provide interoperable models, ontologies and vocabularies for the characterization of workflows, processes and RO components ! ! 23
  • 24. Astronomy WP in Wf4Ever! ! •  ! Characterization of the Astronomy domain in Wf! • ! Detailed study of standards and web services in IVOA! •  Exploration of similar initiatives for the curation of digital objects ! •  Sociological study and working methodology of astronomers! •  Extraction of user and technical requirements! •  Extraction of Taverna user requirements for Astronomy! •  Implementation of first Golden Exemplar! •  Early contacts in IVOA for the creation of a community of users! 24
  • 25. Wf4Ever Update! ! Users’ Requirements! •  Functional requirements for Wf4Ever “working” platform! •  Focused on improving collaboration and reuse! •  Interoperability in exchanging scientific methodology! •  Expose experiment in a structured way to be understood by others! ! RO Modeling! •  Model for interlinked components in a Research Object! •  Strategies for assessing integrity and authenticity! •  Attempts in metrics for Information Quality! ! ! We need to build what we would like to preserve! 25
  • 27. Wf4Ever Update! ! !! ! ! Proposed improvements for Taverna! ! •  VO Registry Access perspective! •  STILTS VOTable Library Integration! •  SAMP (Connectivity with VO Software)! •  Python based Beanshells - Done! •  Simple standard functions for Astronomy! •  ODBC Connector to DB! ! ! ! ! 27
  • 28. Wf4Ever Update! Architecture! •  Search & Retrieval Service! •  Recommender Service! •  I&A Evaluation Service! •  Notification Service! ! ! ! User-Tools Prototypes! •  RO Command Line Tool! •  RO Annotator! •  RO Box! 28
  • 29. Wf4Ever Update! ROBox! ! Seamless contribution to a working collaborative platform! ! A shared folder in Dropbox becomes a Working RO! ! ! ! ! ! ! Automatic generation of metadata ! 29
  • 30. Wf4Ever Update! RO Digital Library and RO Import! 30
  • 31. Wf4Ever Update! ! ! •  Anatomy of a Research Object! •  Annotations on RO components! •  RO Graphical Representation! •  Data/Sessions Inspection (SAMP)! 31
  • 32. Wf4Ever Update! ! ! 32
  • 33. Wf4Ever Update! ! RO Visualization! ! 33
  • 34. Wf4Ever Update! Keywords [galaxies][catalogs] Integrity Rating Downloads 36 Citations [2] Re-used [1] Comments [4] [Previous version | Next version] 34
  • 35. Wf4Ever Update! ! ! Keywords [galaxies][catalogs] Integrity Rating Downloads 36 Re-used [1] Comments [4] [Previous version | Next version] 35
  • 36. Wf4Ever Update! Notification Service for Authors! What should be notified ?! •  Fails! •  Downloads! •  Annotations! •  Linked/Similarity! •  Modifications on Working RO! •  Acknowledgements! ! Notification Management Tool! Avoid spam! 36
  • 37. Astronomy WP in Wf4Ever! Astronomy WP! •  Development and Implementation of “Extraction of Sources”! •  Development and Implementation of “Modelling of 3D Data”! •  Explore experiments subject to be migrated to Wf/RO methodology! •  Contribute to IVOA in Semantics for Processes! ! Other WPs! Continue Providing Feedback! •  RO Model, Architecture, Integrity & Authenticity, Information Quality, etc. ! •  Software integration and improved functionalities (SAMP, Taverna, etc.)! •  Prototypes for management and visualization of RO! ! Community engagement! •  Approach Astro-Informaticians! •  Continue pushing in the IVOA Community! •  Tackle collaboration with Publishers! ! ! 37
  • 38. Workflows & IVOA! Distributed data analysis in the VO! •  Panchromatic, multi-archive, multi-facility! •  Executes in the VO Infrastructure! •  Orchestration of simple services! ! Workflows VO Characterization! Present processing pipelines! •  Inputs! •  Produce exploitable data! •  Outputs! •  Provenance modeling! •  Processes! •  Descriptions! •  VO compliant data ! •  Metadata! ! •  Etc..! Data processing from the VO! •  Provide custom re-processing to VO users! •  Virtual data generation through UWS in VOSpace! 38
  • 39. Related activities in the VO! IVOA Working Groups! •  Data Modeling! Characterization, Provenance..! •  Semantics! Ontologies, Vocabularies for Processes! •  Data Access Layer! TAP, self-descriptive Protocols..! ! •  Grid and Web Services! IVOA Note! UWS, VOSpace, SSO..! Scientific Workflows in the VO! •  Applications! André Schaaff & Jose Enrique Ruiz! SAMP! ! •  IG. KDD! workflow@ivoa.net! Knowledge Discovery and Data Mining! •  IG. Data Curation and Preservation! Persistent Identifiers, Curation of VO Resources..! Wf4Ever Project, US VAO semantic linking of proposals, publications, data! 39