SlideShare a Scribd company logo
www.eosc-synergy.euwww.eosc-synergy.eu
Integration of WORSICA’s
thematic service in EOSC -
challenges and achievements
LNEC: Ricardo Martins, Alberto Azevedo, Anabela Oliveira
LIP: Samuel Bernardo, Jorge Gomes, Mário David, João Pina
DANS-KNAW: Slava Tykhonov
IFCA: Pablo Orviz
www.eosc-synergy.eu
Summary
• What is WORSICA?
• Challenges
• Service Architecture (Past/Present)
• Technical Description
• Conclusions
• Achievements
• Future work
2
www.eosc-synergy.eu
What is WORSICA?
• WORSICA (Water mOnitoRing SentInel Cloud plAtform)
• a web service that aims at integrating remote sensing and in-situ data for the determination of water
presence in coastal and inland areas, applicable to a range of purposes from the determination of flooded
areas (from rainfall, storms, hurricanes or tsunamis) to the detection of large water leaks in major water
irrigation networks.
• WORSICA will reuse and connect some work from other projects/services from LNEC:
• OPENCoastS (https://opencoasts.ncg.ingrid.pt)
• WADI (https://www.waditech.eu)
• Mosaic.pt (http://mosaic.lnec.pt)
3
www.eosc-synergy.eu
Coastline detection
Use of remote sensing (Sentinel-2, Pleiades and
multispectral drone imagery) for the detection of
water-land interface and possible calculation of
the Digital Elevation Model for each line using
the EOSC-hub OPENCoastS service.
WORSICA
Main products
Water bodies detection
Determination of water indexes to detect water
bodies in inland areas (lagoons, reservoirs, etc.),
using satellite and drone-based imagery.
Water leak detection
Take advantage of the work developed in
H2020-WADI project (with “low resolution”
images from sentinel-2) and try to improve
it using Pleiades and drone-based imagery.
4
www.eosc-synergy.eu
WORSICA
User Community and Usage model
○ Users: researchers in coastal engineering as
well as water irrigation networks
management.
○ Access: will be done via a single portal.
○ Usage:
○ Configure and run the workflow, by choosing the
Region of Interest (ROI) and the imagesets [1].
The service will download the imagesets, process
them and generate the final products (water
maps) [2].
○ The user will also be able to upload their own
data from drone surveys or other private satellite
images (e.g. Pleiades) to be processed.
[1]
[2]
5
www.eosc-synergy.eu
WORSICA
Processing Workflow
○ Workflow
○ This processing workflow is done sequentially.
○ For bigger ROIs, more inputs, more time and more resources are needed to
process.
○ Applying this workflow to a big processing job (e.g water leak detection),
that require processing hundreds of imagesets, takes a lot of time to
complete (days).
○ For a ROI, for the same day, it can have more than one imageset available for
download (different orbits), thus the need for merging first inputs.
1B) Download
Imagesets
1A) User Uploaded
Imagesets
2) Atmospheric
correction
3) Resampling
+crop
4) Merging
inputs
5) RGB
Sentinel2
L1C?
Start Finish
If Public source (ESA Sentinel)
If Private source (Pleiades/drone/other VHR)
For each imageset input...
6) Water Indexes
+Filtering
6
7) Final
outputs
www.eosc-synergy.eu
WORSICA
Architecture (State before EOSC-Synergy)
○ Backend receives and responds to user requests, checks
available imagesets and stores final products to the DB.
○ A ‘service’ will download the imageset, then start
processing it using the available tools from the toolbox.
Architecture
○ Everything on the Cloud.
○ No dockers.
○ Frontend provides the portal to the user and
communicates with it’s backend
7
www.eosc-synergy.eu
WORSICA
Challenges on EOSC-Synergy
○ Processing:
○ Network: Download speeds and number of simultaneous downloads of satellite data
from operational providers (e.g. ESA or Pleiades);
○ Storage: of the intermediate and final products;
■ Each input is a imageset with ~1,1GB of size,
■ Intermediate and final outputs will be produced during the processing
○ Computation resources: where the GPU and RAM are highly recommended to speedup
the image processing and to prevent bottlenecks on using the service during processing.
■ Processing a imageset requires at least 8GB of RAM
■ More will be required if imageset requires atmospheric correction.
■ Processing workflow needs to be parallelized.
○ Assure Resilience and Speedup.
○ Thematic service:
○ Assure Scalability, Redundancy, Portability of the service
○ Assure (inter)operability with other thematic services
○ Other:
○ Assure Data FAIRness of the user generated products on this thematic service
8
www.eosc-synergy.eu
WORSICA
Architecture (Actual state during EOSC-Synergy)
Architecture:
○ Frontend component (web portal on Cloud)
○ Intermediate component (simple task orchestrator w/ Celery
on Cloud)
○ Processing component (an instance that will be sent to GRID
infrastructure).
○ Dataverse/Storage/Broker/Database are isolated components
Advantages
○ Provides redundancy, scalability and flexibility
○ Good weight distribution on each component
○ Components are using dockers, easier for portability and
installation
○ Processing jobs submissions are sent by the WORSICA
Intermediate service to a GRID infrastructure
9
www.eosc-synergy.eu
Technical Description: Services
Service Used Provider
Before Planned Before Planned
AAI Local EGI Check-In INCD EGI Federation
Workload Mng. Local batch system ArcCE, SLURM INCD EOSC-Synergy
Resource Mng. Manual IM (TOSCA) INCD EOSC-Synergy
Data Storage Local Nextcloud INCD EOSC-Synergy
Monitoring - ARGO -
EGI Service
Monitoring
Other: Hydrodynamic water
forecasts
- OPENCoastS -
EOSC-hub
marketplace
Other: Dataverse - Dataverse - EOSC-Synergy
Computing Resources Local FedCloud and EGI HTC INCD EOSC-Synergy
Storage Resources Local
FedCloud and EGI Online
storage
INCD EOSC-Synergy
10
www.eosc-synergy.eu
Technical Description: Planned Services
○ Authentication:
○ EGI Check-In: federated authentication is required on WORSICA to have access to the available EOSC
services and resources.
○ Workload Managers:
○ ArcCE with SLURM: This allows efficient management of the available GRID resources for HPC in order to
speed up the processing jobs.
○ Data Manager:
○ Nextcloud: to store processed job submission data input/outputs.
○ Dataverse: to register processed job submission metadata information for data FAIRsFAIR compliance.
(more on the Service QA and Dataverse presentation)
○ Ansible and IM:
○ IM: for deployment of the infrastructure required for job processing, repositories and microservices.
○ SLURM and Kubernetes clusters are deployed using TOSCA template over IaaS service and the remaining
services will be installed from Docker images. Configurations for SLURM and Kubernetes are set up by
ansible playbooks.
○ CI/CD for the automatization of the service integration in EOSC infrastructure:
○ Jenkins pipelines and unitary/functional tests were also developed to be compliant with the SQAaaS
methodologies developed in EOSC-Synergy (more on the Service QA and Dataverse presentation)
11
www.eosc-synergy.eu
Conclusions
WORSICA Achievements on EOSC-Synergy
○ Resilience+Speedup: a better GRID infrastructure to process Sentinel and other VHR
imagery, using robust and tested software.
○ Scalability+Redundancy: possibility to adjust the resources according to the usage.
More users, more resources for the computation
○ Portability+Deployment: possibility to port this thematic service to any other
infrastructure depending of the computational needs
○ Federated access+Support: the need for a federated access is a requirement on
WORSICA to have access and support to the resources and other EOSC services.
○ Interoperability: this thematic service can be a connection for other thematic services,
and can connect with other thematic services to provide additional products.
○ Data FAIRness: provide data FAIRness to the WORSICA user generated products.
12
www.eosc-synergy.eu
Future work
● Improve existing water processing algorithms and/or
implement new ones
● Improve user interaction with the portal.
● Improve interoperability with other thematic services
● Continuous improvement/update of the IT services
implemented in the WORSICA thematic service
13
www.eosc-synergy.eu
Thank you.
Ricardo Martins
rjmartins@lnec.pt
http://worsica.lnec.pt/
14
www.eosc-synergy.euwww.eosc-synergy.eu
Service QA and Dataverse
Speaker:
Vyacheslav Tykhonov (DANS-KNAW) on behalf of EOSC-
Synergy
Collaborators:
Jorge Gomes (LIP), João Pina (LIP), Mário David (LIP), Ricardo
Martins (LNEC), Alberto Azevedo (LNEC), Samuel Bernardo (LIP),
Wilko Steinhoff (DANS-KNAW), Pablo Orviz (CSIC), Isabel
Campos (CSIC), Germán Moltó (UPV) and Miguel Caballer (UPV)
www.eosc-synergy.eu
Introducing Dataverse data repository
● Open source project developed by IQSS of Harvard University
● Great product with very long history (from 2006) and dynamic and
experienced development team
● Clear vision and understanding of research communities requirements,
public roadmap
● Well developed architecture with rich APIs allows to build application
layers around Dataverse
● Strong community behind of Dataverse is helping to improve the basic
functionality and develop it further. DANS-KNAW is leading SSHOC
task to deliver production ready Dataverse for all partners
16
16
www.eosc-synergy.eu
WORSICA and Dataverse Repository (1)
17
Initial service architecture:
• Data missing global unique identifier
• Data stored in multiple places
internal to the services and not
accessible
• Inexistent metadata detailed
provenance association
• Data access not following controlled
vocabularies that apply FAIR
principles
17
www.eosc-synergy.eu
Dataverse Repository (2)
18
FAIR service architecture:
• Dataverse provides the repository solution that complies
with the FAIR principles
• Define a dataverse and associate a persistent identifier
namespace
• Associate metadata with the provided and produced
data
• Use Data Commons to allow data sharing between all
teams and projects
• Metadata is by default associated with CC0 Creative
Commons license and publicly accessible
18
www.eosc-synergy.eu
Dataverse Repository (3)
19
Integrate code with Dataverse REST API:
• Very useful to implement in any language only being
dependent with the provided interface without any
library requirements
• Easy to maintain WORSICA code in parallel with
Dataverse service updates
• Current Dataverse REST API is very complete and
allows to run all necessary operations
• Share sensitive data with confidence using DataTags
System, that will allow to use a set of security features
and access requirements for file handling
19
www.eosc-synergy.eu
Interoperability in WORSICA/Dataverse
20
● Most of variables in WORSICA datasets can be linked in
Dataverse to the appropriate ontologies to increase
interoperability and data FAIRness
● Variable names can be included in datasets metadata in the
native language (Portuguese) and get URI identifiers for those
entities in controlled vocabularies
● Standardized metadata fields available in Linked Open Data
Cloud through standard machine-to-machine interfaces
available in Dataverse
20
www.eosc-synergy.eu
WORSICA
Dataverse Metadata (1)
21
Scope Type Name Description Example Vocabulary URI
Citation title - A title for this dataverse “Figueira da Foz (ROI110 Simulation230)” -
author
(Multiple)
authorName
authorAffiliation
Author or a list of authors who ran the WORSICA
processing to generate this dataverse
“Ricardo Martins”
“LNEC”
-
datasetContact
(Multiple)
datasetContactName
datasetContactEmail
Author or a list of authors who own this dataset “Ricardo Martins”
“rjmartins(at)lnec.pt”
-
dsDescription
(Multiple)
dsDescriptionValue A detailed description or various descriptions of the
dataset
"Simulation230 in ROI110 (38,-8,37,-7) with
minimum bath depth threshold of -10m and
maximum topo depth threshold of 10m"
-
subject
(Multiple)
- An array of thematic subject(s) that identify the
dataverse. Take note these subjects must match with
the ones provided from Dataverse.
["Computer and Information Science", "Earth and
Environmental Sciences"]
https://www.wikidata.org/wiki/
Q21198
dataSource
(Multiple)
- List the data sources (list the name of the processed image sets...) -
otherReferences
(Multiple)
- Other references to mention on this dataset (list of URLs of the processed WORSICA products
from the storage…)
-
software
(Multiple)
softwareName
softwareVersion
List of software used for the processing “GDAL”
“3.0.4”
https://www.wikidata.org/wiki/
Q676202
displayName - A name to this dataverse scope “Figueira da Foz (ROI110 Simulation230) Citation
Metadata”
-
Note:
● Remember, these variables may change
during WORSICA development.
● If you see (Multiple), it means the field allows
to add more than one object on it.
● This is for Dataverse 4.19.
● For further information, we recommend to
check Dataverse documentation.
21
www.eosc-synergy.eu
WORSICA
Dataverse Metadata (2)
22
Scope Type Name Description Example Vocabulary URI
Geospatial fields geograhicCoverage
(Multiple)
country
state
city
otherGeographicCoverage
Details representing the processed
ROI
Portugal
Lisbon
Lisbon
Av. do Brasil
https://www.wikidata.org/wiki/Q625
6
https://www.geonames.org/countrie
s/PT/portugal.html
geographicUnit
(Multiple)
- A list of Default units [“m”,”m”] -
geographicBoundingBox
(Multiple)
westLongitude
eastLongitude
northLongitude
southLongitude
A bounding box or a list of bounding
boxes that represent the processed
ROI
-9
-8
38
37
https://www.wikidata.org/wiki/Prope
rty:P625
displayName - - A name to this dataverse scope “Figueira da Foz (ROI110
Simulation230) Geospatial
Metadata”
-
Note:
● Remember, these variables may change
during WORSICA development.
● If you see (Multiple), it means the field allows
to add more than one object on it.
● This is for Dataverse 4.19.
● For further information, we recommend to
check Dataverse documentation.
22
www.eosc-synergy.eu
SQA process with Selenium tests for Dataverse
Selenium IDE allows
to create and replay
all UI tests in your
browser
Shared tests can be
reused by
community to
increase
reproducibility
SQA for the service maturity = unit tests + integration tests
23
www.eosc-synergy.eu
CI/CD pipeline with SQAaaS (S)
1
2
3
git
push
Push GCP
container
registry
webhook
Create
docker
image
Kubernetes
Deployment
git clone
Jenkins pipeline (Jenkinsfile)
97
Run SQA
S 8
1. Developer pushes code to GitHub
2. Jenkins receives notification - build trigger
3. Jenkins clones the workspace
4. (S) Runs SQA tests and does FAIRness check
5. (S) Issuing digital badge according to the results
6. (S) SQAaaS API triggers appropriate workflow
7. Creates docker image if success
8. Pushes new docker image to container registry
9. Updates the kubernetes deployment
24
www.eosc-synergy.eu
Final remarks
25
Dataverse pros:
• provides a FAIR repository for thematic services and has a rich REST interface
• open source software with Apache License v2.0
• allows to manage public and private data
• data commons sharing along teams / projects
Dataverse cons:
• software integration for data management using Dataverse couldn’t be as quick as
expected because of required learning curve
• an account and associated namespace must be acquired for a fee from a DOI or HDL
provider for persistent identifiers be citable
25
www.eosc-synergy.eu
Thank you
For further information:
communications@eosc-synergy.eu
www.eosc-synergy.eu
26
26

More Related Content

PPTX
The world of Docker and Kubernetes
 
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
5 years of Dataverse evolution
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PPTX
Ontologies, controlled vocabularies and Dataverse
 
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
PPTX
Technical integration of data repositories status and challenges
 
PPTX
Building COVID-19 Knowledge Graph at CoronaWhy
 
The world of Docker and Kubernetes
 
Building COVID-19 Museum as Open Science Project
 
5 years of Dataverse evolution
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Ontologies, controlled vocabularies and Dataverse
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Technical integration of data repositories status and challenges
 
Building COVID-19 Knowledge Graph at CoronaWhy
 

What's hot (20)

PPTX
Setting up Dataverse repository for research data
 
PPTX
External controlled vocabularies support in Dataverse
 
PPTX
CLARIN CMDI support in Dataverse
 
PPTX
Controlled vocabularies and ontologies in Dataverse data repository
 
PPTX
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
PPTX
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
PPTX
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
PPTX
Building an electronic repository and archives on Dataverse in the European O...
 
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PPTX
Metaverse for Dataverse
 
PPTX
External CV support in Dataverse 5.7
 
PPTX
CLARIAH CMDI use case and flexible metadata schemes
PDF
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
PPTX
Flexible metadata schemes for research data repositories - Clarin Conference...
PPTX
The Extreme Data Cloud (XDC) Project
PPTX
iRODS: Interoperability in Data Management
PPT
e-Infrastructure Integration-with gCube
 
PPT
D4Science scientific data infrastructure promoting interoperability by embrac...
 
PDF
OCCIware - A Framework for Everything as a Service - Cloud Expo London 2015
PDF
Setting up Dataverse repository for research data
 
External controlled vocabularies support in Dataverse
 
CLARIN CMDI support in Dataverse
 
Controlled vocabularies and ontologies in Dataverse data repository
 
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and DAN...
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Dataverse SSHOC enrichment of DDI support at EDDI'19 2
 
Building an electronic repository and archives on Dataverse in the European O...
 
CLARIN CMDI use case and flexible metadata schemes
 
Metaverse for Dataverse
 
External CV support in Dataverse 5.7
 
CLARIAH CMDI use case and flexible metadata schemes
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexible metadata schemes for research data repositories - Clarin Conference...
The Extreme Data Cloud (XDC) Project
iRODS: Interoperability in Data Management
e-Infrastructure Integration-with gCube
 
D4Science scientific data infrastructure promoting interoperability by embrac...
 
OCCIware - A Framework for Everything as a Service - Cloud Expo London 2015
Ad

Similar to Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse (20)

PPTX
Quality and capacity expansion of thematic services in EOSC-SYNERGY
PPTX
eROSA Stakeholder WS1: EOSC Architecture
PDF
EOSC-synergy
PDF
PDF
OPENCoastS: An open-access service for producing on-demand coastal hydrodynam...
PDF
The Ascent of Open Science and the European Open Science Cloud
PDF
Reliance project introduction
PPTX
2019 05-21 egi and eosc - final
PPTX
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
PDF
1. Intervention on the expectations on behalf of the EC
PPTX
EOSC-hub in EOSC context
PPTX
European Open Science Cloud: Concept, status and opportunities
PDF
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
PPTX
Introduction to the EOSC-hub project
PPTX
EOSC-hub service portfolio
PPTX
2019 01-15 pa nosc kickoff
PPTX
EGI and EUDAT support to the PaNOSC project
PDF
1. EOSCsecretariat (Donatella Castelli, EOSCsecretariat)
PDF
Towards a Strategic Implementation of the EOSC & Addressing strategic priorit...
PPTX
2019 02-12 eosc-hub for eo
Quality and capacity expansion of thematic services in EOSC-SYNERGY
eROSA Stakeholder WS1: EOSC Architecture
EOSC-synergy
OPENCoastS: An open-access service for producing on-demand coastal hydrodynam...
The Ascent of Open Science and the European Open Science Cloud
Reliance project introduction
2019 05-21 egi and eosc - final
Prompting an EOSC in Practice, Isabel Campos, CSIC & Member of the High Level...
1. Intervention on the expectations on behalf of the EC
EOSC-hub in EOSC context
European Open Science Cloud: Concept, status and opportunities
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Introduction to the EOSC-hub project
EOSC-hub service portfolio
2019 01-15 pa nosc kickoff
EGI and EUDAT support to the PaNOSC project
1. EOSCsecretariat (Donatella Castelli, EOSCsecretariat)
Towards a Strategic Implementation of the EOSC & Addressing strategic priorit...
2019 02-12 eosc-hub for eo
Ad

More from vty (11)

PPTX
Decentralised identifiers and knowledge graphs
 
PPTX
Decentralisation and knowledge graphs
 
PPTX
Decentralised identifiers for CLARIAH infrastructure
 
PPTX
Dataverse repository for research data in the COVID-19 Museum
 
PPTX
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
PPTX
Fighting COVID-19 with Artificial Intelligence
 
PPTX
SSHOC Dataverse in the European Open Science Cloud
 
PPTX
Dataverse in the European Open Science Cloud
 
PPTX
Data standardization process for social sciences and humanities
 
PPTX
Development in Dataverse SSHOC project
 
PPTX
DataverseEU as multilingual repository
 
Decentralised identifiers and knowledge graphs
 
Decentralisation and knowledge graphs
 
Decentralised identifiers for CLARIAH infrastructure
 
Dataverse repository for research data in the COVID-19 Museum
 
Flexible metadata schemes for research data repositories - CLARIN Conference'21
 
Fighting COVID-19 with Artificial Intelligence
 
SSHOC Dataverse in the European Open Science Cloud
 
Dataverse in the European Open Science Cloud
 
Data standardization process for social sciences and humanities
 
Development in Dataverse SSHOC project
 
DataverseEU as multilingual repository
 

Recently uploaded (20)

PDF
Xinzex: A Complete Web Development Guide for Beginners
PPTX
Driving Accountability The Power of Business Responsibility and Sustainabilit...
PDF
Why Infotrench Stands Out as the Best SEO Agency in Noida.pdf
PPTX
How After-School Art Classes Enhance Social Skills.pptx
PDF
The Rise of ICOs in Environmental and Sustainability Projects (2).pdf
PDF
Secure Your World with Acme Enterprises PDF Sharing.pdf
PDF
AI Staffing for Startups & Growing Businesses | Rubixe
PDF
Robert Hume San Diego_ How Firefighting Tools and Technology Have Transformed...
PDF
Choosing an Entrepreneurial Path Based on Your Personality.pdf
PDF
Looking to Work Abroad_ Here’s Why Canada is a Great Option.pdf
PPTX
Enhancing Wastewater Treatment Efficiency with GO2™ Water Treatment Chlorine ...
PDF
Musician Corporate Headshots Los Angeles
PDF
The Dark Web’s Front Door: Finding the Real Hidden Wiki
PPTX
Expert Tree Pruning & Maintenance Services in Sydney
PDF
Understanding LA's Zero Waste Initiative
PDF
The New Drive_ How the Transportation Business is Reinventing Itself by Ednei...
PDF
Risk Assessment Survey of the Esarbica 2025.pdf
PDF
Digital marketing strategy slides .pdf
PDF
Management Colleges In Delhi Ncr | Galgotias University
PDF
Blush & Brown Modern Minimalist eBook Workbook.pdf
Xinzex: A Complete Web Development Guide for Beginners
Driving Accountability The Power of Business Responsibility and Sustainabilit...
Why Infotrench Stands Out as the Best SEO Agency in Noida.pdf
How After-School Art Classes Enhance Social Skills.pptx
The Rise of ICOs in Environmental and Sustainability Projects (2).pdf
Secure Your World with Acme Enterprises PDF Sharing.pdf
AI Staffing for Startups & Growing Businesses | Rubixe
Robert Hume San Diego_ How Firefighting Tools and Technology Have Transformed...
Choosing an Entrepreneurial Path Based on Your Personality.pdf
Looking to Work Abroad_ Here’s Why Canada is a Great Option.pdf
Enhancing Wastewater Treatment Efficiency with GO2™ Water Treatment Chlorine ...
Musician Corporate Headshots Los Angeles
The Dark Web’s Front Door: Finding the Real Hidden Wiki
Expert Tree Pruning & Maintenance Services in Sydney
Understanding LA's Zero Waste Initiative
The New Drive_ How the Transportation Business is Reinventing Itself by Ednei...
Risk Assessment Survey of the Esarbica 2025.pdf
Digital marketing strategy slides .pdf
Management Colleges In Delhi Ncr | Galgotias University
Blush & Brown Modern Minimalist eBook Workbook.pdf

Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse

  • 1. www.eosc-synergy.euwww.eosc-synergy.eu Integration of WORSICA’s thematic service in EOSC - challenges and achievements LNEC: Ricardo Martins, Alberto Azevedo, Anabela Oliveira LIP: Samuel Bernardo, Jorge Gomes, Mário David, João Pina DANS-KNAW: Slava Tykhonov IFCA: Pablo Orviz
  • 2. www.eosc-synergy.eu Summary • What is WORSICA? • Challenges • Service Architecture (Past/Present) • Technical Description • Conclusions • Achievements • Future work 2
  • 3. www.eosc-synergy.eu What is WORSICA? • WORSICA (Water mOnitoRing SentInel Cloud plAtform) • a web service that aims at integrating remote sensing and in-situ data for the determination of water presence in coastal and inland areas, applicable to a range of purposes from the determination of flooded areas (from rainfall, storms, hurricanes or tsunamis) to the detection of large water leaks in major water irrigation networks. • WORSICA will reuse and connect some work from other projects/services from LNEC: • OPENCoastS (https://opencoasts.ncg.ingrid.pt) • WADI (https://www.waditech.eu) • Mosaic.pt (http://mosaic.lnec.pt) 3
  • 4. www.eosc-synergy.eu Coastline detection Use of remote sensing (Sentinel-2, Pleiades and multispectral drone imagery) for the detection of water-land interface and possible calculation of the Digital Elevation Model for each line using the EOSC-hub OPENCoastS service. WORSICA Main products Water bodies detection Determination of water indexes to detect water bodies in inland areas (lagoons, reservoirs, etc.), using satellite and drone-based imagery. Water leak detection Take advantage of the work developed in H2020-WADI project (with “low resolution” images from sentinel-2) and try to improve it using Pleiades and drone-based imagery. 4
  • 5. www.eosc-synergy.eu WORSICA User Community and Usage model ○ Users: researchers in coastal engineering as well as water irrigation networks management. ○ Access: will be done via a single portal. ○ Usage: ○ Configure and run the workflow, by choosing the Region of Interest (ROI) and the imagesets [1]. The service will download the imagesets, process them and generate the final products (water maps) [2]. ○ The user will also be able to upload their own data from drone surveys or other private satellite images (e.g. Pleiades) to be processed. [1] [2] 5
  • 6. www.eosc-synergy.eu WORSICA Processing Workflow ○ Workflow ○ This processing workflow is done sequentially. ○ For bigger ROIs, more inputs, more time and more resources are needed to process. ○ Applying this workflow to a big processing job (e.g water leak detection), that require processing hundreds of imagesets, takes a lot of time to complete (days). ○ For a ROI, for the same day, it can have more than one imageset available for download (different orbits), thus the need for merging first inputs. 1B) Download Imagesets 1A) User Uploaded Imagesets 2) Atmospheric correction 3) Resampling +crop 4) Merging inputs 5) RGB Sentinel2 L1C? Start Finish If Public source (ESA Sentinel) If Private source (Pleiades/drone/other VHR) For each imageset input... 6) Water Indexes +Filtering 6 7) Final outputs
  • 7. www.eosc-synergy.eu WORSICA Architecture (State before EOSC-Synergy) ○ Backend receives and responds to user requests, checks available imagesets and stores final products to the DB. ○ A ‘service’ will download the imageset, then start processing it using the available tools from the toolbox. Architecture ○ Everything on the Cloud. ○ No dockers. ○ Frontend provides the portal to the user and communicates with it’s backend 7
  • 8. www.eosc-synergy.eu WORSICA Challenges on EOSC-Synergy ○ Processing: ○ Network: Download speeds and number of simultaneous downloads of satellite data from operational providers (e.g. ESA or Pleiades); ○ Storage: of the intermediate and final products; ■ Each input is a imageset with ~1,1GB of size, ■ Intermediate and final outputs will be produced during the processing ○ Computation resources: where the GPU and RAM are highly recommended to speedup the image processing and to prevent bottlenecks on using the service during processing. ■ Processing a imageset requires at least 8GB of RAM ■ More will be required if imageset requires atmospheric correction. ■ Processing workflow needs to be parallelized. ○ Assure Resilience and Speedup. ○ Thematic service: ○ Assure Scalability, Redundancy, Portability of the service ○ Assure (inter)operability with other thematic services ○ Other: ○ Assure Data FAIRness of the user generated products on this thematic service 8
  • 9. www.eosc-synergy.eu WORSICA Architecture (Actual state during EOSC-Synergy) Architecture: ○ Frontend component (web portal on Cloud) ○ Intermediate component (simple task orchestrator w/ Celery on Cloud) ○ Processing component (an instance that will be sent to GRID infrastructure). ○ Dataverse/Storage/Broker/Database are isolated components Advantages ○ Provides redundancy, scalability and flexibility ○ Good weight distribution on each component ○ Components are using dockers, easier for portability and installation ○ Processing jobs submissions are sent by the WORSICA Intermediate service to a GRID infrastructure 9
  • 10. www.eosc-synergy.eu Technical Description: Services Service Used Provider Before Planned Before Planned AAI Local EGI Check-In INCD EGI Federation Workload Mng. Local batch system ArcCE, SLURM INCD EOSC-Synergy Resource Mng. Manual IM (TOSCA) INCD EOSC-Synergy Data Storage Local Nextcloud INCD EOSC-Synergy Monitoring - ARGO - EGI Service Monitoring Other: Hydrodynamic water forecasts - OPENCoastS - EOSC-hub marketplace Other: Dataverse - Dataverse - EOSC-Synergy Computing Resources Local FedCloud and EGI HTC INCD EOSC-Synergy Storage Resources Local FedCloud and EGI Online storage INCD EOSC-Synergy 10
  • 11. www.eosc-synergy.eu Technical Description: Planned Services ○ Authentication: ○ EGI Check-In: federated authentication is required on WORSICA to have access to the available EOSC services and resources. ○ Workload Managers: ○ ArcCE with SLURM: This allows efficient management of the available GRID resources for HPC in order to speed up the processing jobs. ○ Data Manager: ○ Nextcloud: to store processed job submission data input/outputs. ○ Dataverse: to register processed job submission metadata information for data FAIRsFAIR compliance. (more on the Service QA and Dataverse presentation) ○ Ansible and IM: ○ IM: for deployment of the infrastructure required for job processing, repositories and microservices. ○ SLURM and Kubernetes clusters are deployed using TOSCA template over IaaS service and the remaining services will be installed from Docker images. Configurations for SLURM and Kubernetes are set up by ansible playbooks. ○ CI/CD for the automatization of the service integration in EOSC infrastructure: ○ Jenkins pipelines and unitary/functional tests were also developed to be compliant with the SQAaaS methodologies developed in EOSC-Synergy (more on the Service QA and Dataverse presentation) 11
  • 12. www.eosc-synergy.eu Conclusions WORSICA Achievements on EOSC-Synergy ○ Resilience+Speedup: a better GRID infrastructure to process Sentinel and other VHR imagery, using robust and tested software. ○ Scalability+Redundancy: possibility to adjust the resources according to the usage. More users, more resources for the computation ○ Portability+Deployment: possibility to port this thematic service to any other infrastructure depending of the computational needs ○ Federated access+Support: the need for a federated access is a requirement on WORSICA to have access and support to the resources and other EOSC services. ○ Interoperability: this thematic service can be a connection for other thematic services, and can connect with other thematic services to provide additional products. ○ Data FAIRness: provide data FAIRness to the WORSICA user generated products. 12
  • 13. www.eosc-synergy.eu Future work ● Improve existing water processing algorithms and/or implement new ones ● Improve user interaction with the portal. ● Improve interoperability with other thematic services ● Continuous improvement/update of the IT services implemented in the WORSICA thematic service 13
  • 15. www.eosc-synergy.euwww.eosc-synergy.eu Service QA and Dataverse Speaker: Vyacheslav Tykhonov (DANS-KNAW) on behalf of EOSC- Synergy Collaborators: Jorge Gomes (LIP), João Pina (LIP), Mário David (LIP), Ricardo Martins (LNEC), Alberto Azevedo (LNEC), Samuel Bernardo (LIP), Wilko Steinhoff (DANS-KNAW), Pablo Orviz (CSIC), Isabel Campos (CSIC), Germán Moltó (UPV) and Miguel Caballer (UPV)
  • 16. www.eosc-synergy.eu Introducing Dataverse data repository ● Open source project developed by IQSS of Harvard University ● Great product with very long history (from 2006) and dynamic and experienced development team ● Clear vision and understanding of research communities requirements, public roadmap ● Well developed architecture with rich APIs allows to build application layers around Dataverse ● Strong community behind of Dataverse is helping to improve the basic functionality and develop it further. DANS-KNAW is leading SSHOC task to deliver production ready Dataverse for all partners 16 16
  • 17. www.eosc-synergy.eu WORSICA and Dataverse Repository (1) 17 Initial service architecture: • Data missing global unique identifier • Data stored in multiple places internal to the services and not accessible • Inexistent metadata detailed provenance association • Data access not following controlled vocabularies that apply FAIR principles 17
  • 18. www.eosc-synergy.eu Dataverse Repository (2) 18 FAIR service architecture: • Dataverse provides the repository solution that complies with the FAIR principles • Define a dataverse and associate a persistent identifier namespace • Associate metadata with the provided and produced data • Use Data Commons to allow data sharing between all teams and projects • Metadata is by default associated with CC0 Creative Commons license and publicly accessible 18
  • 19. www.eosc-synergy.eu Dataverse Repository (3) 19 Integrate code with Dataverse REST API: • Very useful to implement in any language only being dependent with the provided interface without any library requirements • Easy to maintain WORSICA code in parallel with Dataverse service updates • Current Dataverse REST API is very complete and allows to run all necessary operations • Share sensitive data with confidence using DataTags System, that will allow to use a set of security features and access requirements for file handling 19
  • 20. www.eosc-synergy.eu Interoperability in WORSICA/Dataverse 20 ● Most of variables in WORSICA datasets can be linked in Dataverse to the appropriate ontologies to increase interoperability and data FAIRness ● Variable names can be included in datasets metadata in the native language (Portuguese) and get URI identifiers for those entities in controlled vocabularies ● Standardized metadata fields available in Linked Open Data Cloud through standard machine-to-machine interfaces available in Dataverse 20
  • 21. www.eosc-synergy.eu WORSICA Dataverse Metadata (1) 21 Scope Type Name Description Example Vocabulary URI Citation title - A title for this dataverse “Figueira da Foz (ROI110 Simulation230)” - author (Multiple) authorName authorAffiliation Author or a list of authors who ran the WORSICA processing to generate this dataverse “Ricardo Martins” “LNEC” - datasetContact (Multiple) datasetContactName datasetContactEmail Author or a list of authors who own this dataset “Ricardo Martins” “rjmartins(at)lnec.pt” - dsDescription (Multiple) dsDescriptionValue A detailed description or various descriptions of the dataset "Simulation230 in ROI110 (38,-8,37,-7) with minimum bath depth threshold of -10m and maximum topo depth threshold of 10m" - subject (Multiple) - An array of thematic subject(s) that identify the dataverse. Take note these subjects must match with the ones provided from Dataverse. ["Computer and Information Science", "Earth and Environmental Sciences"] https://www.wikidata.org/wiki/ Q21198 dataSource (Multiple) - List the data sources (list the name of the processed image sets...) - otherReferences (Multiple) - Other references to mention on this dataset (list of URLs of the processed WORSICA products from the storage…) - software (Multiple) softwareName softwareVersion List of software used for the processing “GDAL” “3.0.4” https://www.wikidata.org/wiki/ Q676202 displayName - A name to this dataverse scope “Figueira da Foz (ROI110 Simulation230) Citation Metadata” - Note: ● Remember, these variables may change during WORSICA development. ● If you see (Multiple), it means the field allows to add more than one object on it. ● This is for Dataverse 4.19. ● For further information, we recommend to check Dataverse documentation. 21
  • 22. www.eosc-synergy.eu WORSICA Dataverse Metadata (2) 22 Scope Type Name Description Example Vocabulary URI Geospatial fields geograhicCoverage (Multiple) country state city otherGeographicCoverage Details representing the processed ROI Portugal Lisbon Lisbon Av. do Brasil https://www.wikidata.org/wiki/Q625 6 https://www.geonames.org/countrie s/PT/portugal.html geographicUnit (Multiple) - A list of Default units [“m”,”m”] - geographicBoundingBox (Multiple) westLongitude eastLongitude northLongitude southLongitude A bounding box or a list of bounding boxes that represent the processed ROI -9 -8 38 37 https://www.wikidata.org/wiki/Prope rty:P625 displayName - - A name to this dataverse scope “Figueira da Foz (ROI110 Simulation230) Geospatial Metadata” - Note: ● Remember, these variables may change during WORSICA development. ● If you see (Multiple), it means the field allows to add more than one object on it. ● This is for Dataverse 4.19. ● For further information, we recommend to check Dataverse documentation. 22
  • 23. www.eosc-synergy.eu SQA process with Selenium tests for Dataverse Selenium IDE allows to create and replay all UI tests in your browser Shared tests can be reused by community to increase reproducibility SQA for the service maturity = unit tests + integration tests 23
  • 24. www.eosc-synergy.eu CI/CD pipeline with SQAaaS (S) 1 2 3 git push Push GCP container registry webhook Create docker image Kubernetes Deployment git clone Jenkins pipeline (Jenkinsfile) 97 Run SQA S 8 1. Developer pushes code to GitHub 2. Jenkins receives notification - build trigger 3. Jenkins clones the workspace 4. (S) Runs SQA tests and does FAIRness check 5. (S) Issuing digital badge according to the results 6. (S) SQAaaS API triggers appropriate workflow 7. Creates docker image if success 8. Pushes new docker image to container registry 9. Updates the kubernetes deployment 24
  • 25. www.eosc-synergy.eu Final remarks 25 Dataverse pros: • provides a FAIR repository for thematic services and has a rich REST interface • open source software with Apache License v2.0 • allows to manage public and private data • data commons sharing along teams / projects Dataverse cons: • software integration for data management using Dataverse couldn’t be as quick as expected because of required learning curve • an account and associated namespace must be acquired for a fee from a DOI or HDL provider for persistent identifiers be citable 25
  • 26. www.eosc-synergy.eu Thank you For further information: communications@eosc-synergy.eu www.eosc-synergy.eu 26 26