SlideShare a Scribd company logo
e-Infrastructure Integration with gCube Andrea Manzi  ( CERN ) Pasquale Pagano  ( ISTI-CNR ) EGI User Forum  13 April 2011 Vilnius ( Lithuania ) www.d4science.eu
Outline D4Science II Ecosystem gCube architecture Interoperability approaches Resource Discovery Data Storage & Access Data Discovery  Data Process Security Applications AquaMaps Time Series
D4Science II Ecosystem Heterogeneous resources  Heterogeneous computational platforms Rich set of legacy applications Multiple administrative domains Evolving communities D4SCIENCE INFRASTRUCTURE Hadoop EGEE/EGI INSPIRE DRIVER GENESI-DR AquaMaps FAO Geonetwork FAO FIGIS Community A Community B
gCube architecture gCube run-time environment gCube Definition and Management Services gCube Application Services Presentation Services Portlets Application Support Layer Information Organization Services Storage Management Collection -  Content -  Metadata - Annotation -… Management Information Access Services Search Framework Ontology Management Personalization Service Index Management Framework DIR Support Framework Process Execution Management VRE Management Information System Security gCube Container gCore Framework User Services
Virtual Organization A  Virtual Organization  (VO) specifies how a set of users can access a set of resources  what is shared who is allowed to share  the conditions under which  sharing can occur  The concept of VO Is not adequate to cover some common scenarios Data needs to be assessed before to make it publically exploitable by the VO members. Restricted set of users have to collaborate to refine processes and implement show cases.  Products generated through elaboration of data or simulation have to be validated by expert users.
Virtual Research Environment Virtual Research Environment (VRE)  is   a distributed and  dynamically created environment  where subset of resources can be assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the infrastructure Integrated with cloud systems ( OpenNebula ) VRE 1 VO gCube is a first example of a VRE management system VRE 2
Interoperability: Assumptions Very rich applications and data collections are currently maintained by a multitude of authoritative providers Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, … Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), … Several standards are adopted in the same domain
Interoperability: Landscape Unstructured Data: blob (binary), and textual files Structured Data: tabular, statistical, geospatial, temporal, and textual data Compound Data: data composed by unstructured and structured data entities  security
Interoperability: gCube Vision gCube objectives: hide heterogeneity , i.e. abstract over differences in location, protocol, and model; embrace heterogeneity , i.e. allow for multiple locations, protocols, and models;  Technical goals: no bottlenecks : scale no less than the interfaced resources no outages : keep failures partial and temporary autonomicity : system reacts and recovers
Hiding Heterogeneity  Heterogeneous resources are virtually accessible in a common ecosystem of resources  despite their locations, technologies, and protocol Different communities have access to different views according to the conditions under which the sharing can occur Each community can define its own VRE for a limited timeframe and  at no cost for the providers of the resource Several VRE can coexist without interfering each other even by competing for the same resources
Embracing Heterogeneity Approaches and solutions to achieve interoperability : Blackboard-based asynchronous communication between components in a system one protocol to R/W and one language to specify messages Wrapper/ Mediator-based translates one interface for a component into a compatible interface Adaptor-based provides a unified interface to a set of other components interfaces and encapsulates how this set of objects interact
Each resource is represented by a profile (metadata) characterising: the interface the state the list of dependencies the run-time status the policies the configuration  the pending tasks to execute A Resource profile  is published by the resource owner  is discovered by the resource consumers asynchronously through a common resource-independent protocol gCube offers a distributed and scalable Information System ( blackboard ) to store, discover, and access resource profiles Interoperability Approaches:  Resource Discovery  gCube interoperability framework: the solution
Interoperability Approaches:  Content Interoperability[1/2] gCube Open Content Management Architecture  (OCMA) Assumption data stored in different storage back-ends  diverse locations, models, access types few common primitives: documents, collections, repositories gCube allows to reach content that lies outside system expose content (reachable from) inside system perform coarse-grained as well as fine-grained retrieval, update, and addressing Runtime scalability autonomic read-only state replication,  maximize throughput, minimize response time: discovery-time load balancing (through IS) reduce latencies Software  plugin-based architecture to reduce development costs (plugins over Storage systems)
Interoperability Approaches:  Content Interoperability[2/2] T 1 T 2 adapts adapts factory gDoc … gDoc Read gDoc Write Content Manager Service  ( OCMA Service) Adapts gCube doc model  ( gDoc ) to an unbounded number of  back-end types
Interoperability Approaches : Data Discovery gCube offers Several index types Forward indexing, which supports ultra fast lookups on tabular typed metadata; XML indexing, that supports semistructured lookups on content metadata; Textual field indexing, that supports full text and qualified lookups on textual (mainly) metadata; Metadata full text indexing, that enables full text lookups on metadata; Content full text indexing, that enables full text lookups on text extracted by content; Geospatial/temporal indexing, that enables geospatial proximity and coverage queries to be executed over geospatial/temporal metadata; Feature indexing, that enables high-dimension vector indexing, for feature lookup (currently the feature is inactive);
gCube offers solutions to: Decouple the business domain and infrastructure specific logic from the core “execution” functionality Invocate a wide range of logic components: SOAP and REST WebServices, Shell Scripts, Executable Binaries, POJOs,  … Support most of the execution paradigms: batch, map-reduce, synchronous call Bridges key distributed computation technologies: grid (gLite and Globus), Condor, Hadoop Control and monitor the execution of a processing flow Staging of data among different storage providers Streaming data among computation elements Interoperability Approaches :  Process Execution [1/2]
Interoperability Approaches :  Process Execution [2/2] By using adaptors that operate on a specific third party language and translate them into native constructs,  allow for the creation of  complex workflows that exploit several diverse technologies  deployed  on different infrastructures
Interoperability Approaches :  Security [1/2] gCube offers solutions : To secure access to gCube resources for interoperable external systems (incoming security) To ensure Interoperability of gCube security mechanisms with standards compliant security systems (reuse) To facilitate secure access to external resources for gCube services (outgoing)
Authz:  XACML for authz request/response protocol and policy definition  SAML assertions to transport user/service authN information Argus-based approach (EMI Authz framework) having pluggable design to integrate additional PIPs SAML Profile for XACML 2.0 following the OASIS Authorization Interoperability Profile Specification AuthN: Production level SSL/HTTPS support Key- and Trust-Manager Interoperability Approaches :  Security [2/2]
AquaMaps  is an application*  tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species,  that generates color-coded species range maps using a half-degree latitude and longitude blocks by interfacing several databases and repository providers Species Distribution Maps Generation * A lgorithm by Kashner et al. 2006
AquaMaps execution is based on the  gCube Ecological Niche Modelling Suite  which allows the extrapolation of known species occurrences  Species Distribution Maps Generation to determine environmental envelopes  (species tolerances) to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution) Very large volume of input and output data : HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360 Very large number of computation : One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
Time Series Management Offers a set of tools to manage capture statistics Supports the complete TS lifecycle  Supports validation, curation, and analysis  Provides support for data reallocation Produces uniform data-set
Time Series  and R statistical software integration The main aims are to: provide a complete, fully working, environment for R language give user methods to automatically extract data from the time series he was working on give user the possibility to perform queries on the time series database provide a service distributed on the infrastructure. Multiple instances can be managed on the infrastructure VREs, the distribution being transparent to the users (SaaS model)
Conclusions gCube System: Stable software being improved over the last 5 years ( end of DILIGENT -> D4Science -> D4ScienceII) gCube offers a variety of patterns, tools, and solutions  to delivery interoperability solutions and interconnect  Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms to decrease the cost of adoption to deal with several standards
Questions Time

More Related Content

PPT
D4Science scientific data infrastructure promoting interoperability by embrac...
 
PPTX
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
PDF
Mn3422372248
PPTX
CLARIN CMDI use case and flexible metadata schemes
 
PDF
Session19 Globus
PDF
An Optimal Cooperative Provable Data Possession Scheme for Distributed Cloud ...
PDF
Cooperative Architectures and Algorithms for Discovery and ...
PPTX
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
D4Science scientific data infrastructure promoting interoperability by embrac...
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
Mn3422372248
CLARIN CMDI use case and flexible metadata schemes
 
Session19 Globus
An Optimal Cooperative Provable Data Possession Scheme for Distributed Cloud ...
Cooperative Architectures and Algorithms for Discovery and ...
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 

What's hot (20)

PPT
PDF
IoT Protocols Integration with Vortex Gateway
PDF
Distributed Algorithms with DDS
PDF
Introducing Vortex Lite
PPTX
The world of Docker and Kubernetes
 
PDF
Architecting IoT Systems with Vortex
PPTX
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
PDF
IRJET - A Secure Access Policies based on Data Deduplication System
PDF
DDS In Action Part II
PDF
Data Sharing in Extremely Resource Constrained Envionrments
PDF
Reactive Data Centric Architectures with Vortex, Spark and ReactiveX
PPT
Real Time Java DDS
PDF
Building Reactive Applications with DDS
PDF
Desktop, Embedded and Mobile Apps with Vortex Café
PDF
Cryptographic Cloud Storage with Hadoop Implementation
PDF
Cyclone DDS: Sharing Data in the IoT Age
PDF
International Journal of Computational Engineering Research(IJCER)
PPT
Data Grid Taxonomies
PDF
The DDS Tutorial Part II
PDF
TCP connection management in SDN
IoT Protocols Integration with Vortex Gateway
Distributed Algorithms with DDS
Introducing Vortex Lite
The world of Docker and Kubernetes
 
Architecting IoT Systems with Vortex
Integration of WORSICA’s thematic service in EOSC, Service QA and Dataverse
 
IRJET - A Secure Access Policies based on Data Deduplication System
DDS In Action Part II
Data Sharing in Extremely Resource Constrained Envionrments
Reactive Data Centric Architectures with Vortex, Spark and ReactiveX
Real Time Java DDS
Building Reactive Applications with DDS
Desktop, Embedded and Mobile Apps with Vortex Café
Cryptographic Cloud Storage with Hadoop Implementation
Cyclone DDS: Sharing Data in the IoT Age
International Journal of Computational Engineering Research(IJCER)
Data Grid Taxonomies
The DDS Tutorial Part II
TCP connection management in SDN
Ad

Viewers also liked (6)

PPT
Twitter 201: Adding Twitter to Your Strategic PR Toolbox
PPTX
Twitter 201: Real World Guidance for Building a
PPT
Limetone Pictures by Bergamo Designs
PPT
D4 science scientific data infrastructure promoting interoperability by embra...
 
PDF
Twitter 101: Everything You Always Wanted to Know About Twitter, but Were Afr...
PDF
YNPNdc SM Summit: Keynote
Twitter 201: Adding Twitter to Your Strategic PR Toolbox
Twitter 201: Real World Guidance for Building a
Limetone Pictures by Bergamo Designs
D4 science scientific data infrastructure promoting interoperability by embra...
 
Twitter 101: Everything You Always Wanted to Know About Twitter, but Were Afr...
YNPNdc SM Summit: Keynote
Ad

Similar to e-Infrastructure Integration-with gCube (20)

PDF
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
PDF
H017144148
PPTX
FAIR Computational Workflows
PDF
PPTX
Cs6703 grid and cloud computing unit 4
PPTX
5 years of Dataverse evolution
 
PPT
Grid Presentation
PPT
Grid Computing
PPTX
Grid computing
PDF
Grid computing: An Emerging Technology
PDF
Resist Dictionary Attacks Using Password Based Protocols For Authenticated Ke...
PPTX
OSFair2017 Workshop | EPOS: European Plate Observing System
PDF
Cs556 section1
PDF
Authenticated key exchange protocols for parallel network file systems
PPT
grid mining
PPTX
Cloud computing
PPTX
OSFair2017 Workshop | EGI applications database
PDF
An Efficient PDP Scheme for Distributed Cloud Storage
PDF
Authenticated Key Exchange Protocols for Parallel Network File Systems
PDF
Design of storage benchmark kit framework for supporting the file storage ret...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
H017144148
FAIR Computational Workflows
Cs6703 grid and cloud computing unit 4
5 years of Dataverse evolution
 
Grid Presentation
Grid Computing
Grid computing
Grid computing: An Emerging Technology
Resist Dictionary Attacks Using Password Based Protocols For Authenticated Ke...
OSFair2017 Workshop | EPOS: European Plate Observing System
Cs556 section1
Authenticated key exchange protocols for parallel network file systems
grid mining
Cloud computing
OSFair2017 Workshop | EGI applications database
An Efficient PDP Scheme for Distributed Cloud Storage
Authenticated Key Exchange Protocols for Parallel Network File Systems
Design of storage benchmark kit framework for supporting the file storage ret...

More from FAO (10)

PPT
D4science-II Codata
 
PPT
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 
PPT
Interoperability and standards adoption FAO’s inputs (ICT2010 Networking Se...
 
PPT
Data integration and standards at ioc of UNESCO (ICT2010 Networking Session)
 
PPT
A european spatial data infrastructure under construction context, scope and ...
 
PPT
VRE Exploitation
 
PDF
VRE Definition And Creation
 
PDF
VRE Monitoring And Support
 
PDF
VRE - User And Role Management
 
PDF
Perspectives on Collaborative Research Environments offered by D4Science
 
D4science-II Codata
 
D4Science: An e-Infrastructure for Facilitating Fisheries and Aquaculture Re...
 
Interoperability and standards adoption FAO’s inputs (ICT2010 Networking Se...
 
Data integration and standards at ioc of UNESCO (ICT2010 Networking Session)
 
A european spatial data infrastructure under construction context, scope and ...
 
VRE Exploitation
 
VRE Definition And Creation
 
VRE Monitoring And Support
 
VRE - User And Role Management
 
Perspectives on Collaborative Research Environments offered by D4Science
 

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Cell Structure & Organelles in detailed.
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Institutional Correction lecture only . . .
PDF
Classroom Observation Tools for Teachers
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Cell Structure & Organelles in detailed.
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Microbial disease of the cardiovascular and lymphatic systems
2.FourierTransform-ShortQuestionswithAnswers.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPH.pptx obstetrics and gynecology in nursing
human mycosis Human fungal infections are called human mycosis..pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Renaissance Architecture: A Journey from Faith to Humanism
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Pre independence Education in Inndia.pdf
Anesthesia in Laparoscopic Surgery in India
102 student loan defaulters named and shamed – Is someone you know on the list?
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pharma ospi slides which help in ospi learning
Supply Chain Operations Speaking Notes -ICLT Program
Institutional Correction lecture only . . .
Classroom Observation Tools for Teachers

e-Infrastructure Integration-with gCube

  • 1. e-Infrastructure Integration with gCube Andrea Manzi ( CERN ) Pasquale Pagano ( ISTI-CNR ) EGI User Forum 13 April 2011 Vilnius ( Lithuania ) www.d4science.eu
  • 2. Outline D4Science II Ecosystem gCube architecture Interoperability approaches Resource Discovery Data Storage & Access Data Discovery Data Process Security Applications AquaMaps Time Series
  • 3. D4Science II Ecosystem Heterogeneous resources Heterogeneous computational platforms Rich set of legacy applications Multiple administrative domains Evolving communities D4SCIENCE INFRASTRUCTURE Hadoop EGEE/EGI INSPIRE DRIVER GENESI-DR AquaMaps FAO Geonetwork FAO FIGIS Community A Community B
  • 4. gCube architecture gCube run-time environment gCube Definition and Management Services gCube Application Services Presentation Services Portlets Application Support Layer Information Organization Services Storage Management Collection - Content - Metadata - Annotation -… Management Information Access Services Search Framework Ontology Management Personalization Service Index Management Framework DIR Support Framework Process Execution Management VRE Management Information System Security gCube Container gCore Framework User Services
  • 5. Virtual Organization A Virtual Organization (VO) specifies how a set of users can access a set of resources what is shared who is allowed to share the conditions under which sharing can occur The concept of VO Is not adequate to cover some common scenarios Data needs to be assessed before to make it publically exploitable by the VO members. Restricted set of users have to collaborate to refine processes and implement show cases. Products generated through elaboration of data or simulation have to be validated by expert users.
  • 6. Virtual Research Environment Virtual Research Environment (VRE) is a distributed and dynamically created environment where subset of resources can be assigned to a subset of users via interfaces for a limited timeframe at little or no cost for the providers of the infrastructure Integrated with cloud systems ( OpenNebula ) VRE 1 VO gCube is a first example of a VRE management system VRE 2
  • 7. Interoperability: Assumptions Very rich applications and data collections are currently maintained by a multitude of authoritative providers Different problems require different execution paradigms: batch, map-reduce, synchronous call, message-queue, … Key distributed computation technologies exist: grid (gLite and Globus), distributed resource management (Condor), clusters (Hadoop), … Several standards are adopted in the same domain
  • 8. Interoperability: Landscape Unstructured Data: blob (binary), and textual files Structured Data: tabular, statistical, geospatial, temporal, and textual data Compound Data: data composed by unstructured and structured data entities security
  • 9. Interoperability: gCube Vision gCube objectives: hide heterogeneity , i.e. abstract over differences in location, protocol, and model; embrace heterogeneity , i.e. allow for multiple locations, protocols, and models; Technical goals: no bottlenecks : scale no less than the interfaced resources no outages : keep failures partial and temporary autonomicity : system reacts and recovers
  • 10. Hiding Heterogeneity Heterogeneous resources are virtually accessible in a common ecosystem of resources despite their locations, technologies, and protocol Different communities have access to different views according to the conditions under which the sharing can occur Each community can define its own VRE for a limited timeframe and at no cost for the providers of the resource Several VRE can coexist without interfering each other even by competing for the same resources
  • 11. Embracing Heterogeneity Approaches and solutions to achieve interoperability : Blackboard-based asynchronous communication between components in a system one protocol to R/W and one language to specify messages Wrapper/ Mediator-based translates one interface for a component into a compatible interface Adaptor-based provides a unified interface to a set of other components interfaces and encapsulates how this set of objects interact
  • 12. Each resource is represented by a profile (metadata) characterising: the interface the state the list of dependencies the run-time status the policies the configuration the pending tasks to execute A Resource profile is published by the resource owner is discovered by the resource consumers asynchronously through a common resource-independent protocol gCube offers a distributed and scalable Information System ( blackboard ) to store, discover, and access resource profiles Interoperability Approaches: Resource Discovery gCube interoperability framework: the solution
  • 13. Interoperability Approaches: Content Interoperability[1/2] gCube Open Content Management Architecture (OCMA) Assumption data stored in different storage back-ends diverse locations, models, access types few common primitives: documents, collections, repositories gCube allows to reach content that lies outside system expose content (reachable from) inside system perform coarse-grained as well as fine-grained retrieval, update, and addressing Runtime scalability autonomic read-only state replication, maximize throughput, minimize response time: discovery-time load balancing (through IS) reduce latencies Software plugin-based architecture to reduce development costs (plugins over Storage systems)
  • 14. Interoperability Approaches: Content Interoperability[2/2] T 1 T 2 adapts adapts factory gDoc … gDoc Read gDoc Write Content Manager Service ( OCMA Service) Adapts gCube doc model ( gDoc ) to an unbounded number of back-end types
  • 15. Interoperability Approaches : Data Discovery gCube offers Several index types Forward indexing, which supports ultra fast lookups on tabular typed metadata; XML indexing, that supports semistructured lookups on content metadata; Textual field indexing, that supports full text and qualified lookups on textual (mainly) metadata; Metadata full text indexing, that enables full text lookups on metadata; Content full text indexing, that enables full text lookups on text extracted by content; Geospatial/temporal indexing, that enables geospatial proximity and coverage queries to be executed over geospatial/temporal metadata; Feature indexing, that enables high-dimension vector indexing, for feature lookup (currently the feature is inactive);
  • 16. gCube offers solutions to: Decouple the business domain and infrastructure specific logic from the core “execution” functionality Invocate a wide range of logic components: SOAP and REST WebServices, Shell Scripts, Executable Binaries, POJOs, … Support most of the execution paradigms: batch, map-reduce, synchronous call Bridges key distributed computation technologies: grid (gLite and Globus), Condor, Hadoop Control and monitor the execution of a processing flow Staging of data among different storage providers Streaming data among computation elements Interoperability Approaches : Process Execution [1/2]
  • 17. Interoperability Approaches : Process Execution [2/2] By using adaptors that operate on a specific third party language and translate them into native constructs, allow for the creation of complex workflows that exploit several diverse technologies deployed on different infrastructures
  • 18. Interoperability Approaches : Security [1/2] gCube offers solutions : To secure access to gCube resources for interoperable external systems (incoming security) To ensure Interoperability of gCube security mechanisms with standards compliant security systems (reuse) To facilitate secure access to external resources for gCube services (outgoing)
  • 19. Authz: XACML for authz request/response protocol and policy definition SAML assertions to transport user/service authN information Argus-based approach (EMI Authz framework) having pluggable design to integrate additional PIPs SAML Profile for XACML 2.0 following the OASIS Authorization Interoperability Profile Specification AuthN: Production level SSL/HTTPS support Key- and Trust-Manager Interoperability Approaches : Security [2/2]
  • 20. AquaMaps is an application* tailored to predict global distributions of marine species initially designed for marine mammals and subsequently generalised to marine species, that generates color-coded species range maps using a half-degree latitude and longitude blocks by interfacing several databases and repository providers Species Distribution Maps Generation * A lgorithm by Kashner et al. 2006
  • 21. AquaMaps execution is based on the gCube Ecological Niche Modelling Suite which allows the extrapolation of known species occurrences Species Distribution Maps Generation to determine environmental envelopes (species tolerances) to predict future distributions by matching species tolerances against local environmental conditions (e.g. climate change and sea pollution) Very large volume of input and output data : HSPEC native range 56,468,301 - HSPEC suitable range 114,989,360 Very large number of computation : One multispecies map computed on 6,188 half degree cells (over 170k) and 2,540 species requires 125 millions computations (Eli E. Agbayani, FishBase Project/INCOFISH WP1, WorlFish Center)
  • 22. Time Series Management Offers a set of tools to manage capture statistics Supports the complete TS lifecycle Supports validation, curation, and analysis Provides support for data reallocation Produces uniform data-set
  • 23. Time Series and R statistical software integration The main aims are to: provide a complete, fully working, environment for R language give user methods to automatically extract data from the time series he was working on give user the possibility to perform queries on the time series database provide a service distributed on the infrastructure. Multiple instances can be managed on the infrastructure VREs, the distribution being transparent to the users (SaaS model)
  • 24. Conclusions gCube System: Stable software being improved over the last 5 years ( end of DILIGENT -> D4Science -> D4ScienceII) gCube offers a variety of patterns, tools, and solutions to delivery interoperability solutions and interconnect Heterogeneous digital content Heterogeneous repository systems Heterogeneous computation platforms to decrease the cost of adoption to deal with several standards

Editor's Notes

  • #6: Scenarios: TS curation 2? 3) the Aquamaps maps expert validation
  • #7: VRE resources can be published in the VO at any time by the VRE data managers.
  • #8: Interoperability is among the most critical issues to be faced when building systems as "collections" of independently developed constituents (systems on its own) that should co-operate and rely on each other to accomplish larger tasks. Unfortunately, interoperability is a kind of problem that has multiple facets and it is very challenging. Interoperability issues arising whenever two (or more systems) are willing to ''share'' a certain resource (whatever it is) and one of the two systems plays the role of ''provider'' of the resource while the other plays the role of a ''consumer'' of this resource. The multiple facets result from the fact there are multiple barriers hindering the involved systems to ''share'' a common understanding of the resource that is the target of the interoperability scenario. These barriers range from different models of the resource to different protocols and API to access the resource and interact with it, different policies (and policy models) governing the resource consumption, etc.
  • #9: Different Interoperability approaches
  • #11: Resources accessble from a common ecosystem of resources and different communities can access different Ecosystem views. Communities define their VREs ( trasparentrly from the providers, which could be also Cloud systems) Competition of the same resources btw VREs ( eg. Indexes or Storage)
  • #12: Blacboard bases ( Information System) Wrapper /medoator based ( CM) Adaptor-based ( PES adaptor over condor, grid and hadoop) map reduce)
  • #14: OCMA is an open, WSRF-compliant architecture for gCube content management services. OCMA defines a design pattern for such services and, by contextualisation of the pattern, their role in a gCube infrastructure. Requirements and Assumptions OCMA acknowledges that gCube is concerned with content that may: be hosted inside or outside a gCube infrastructure; be described with a variety of models, for different media, and with different degrees of structure; be accessed with a variety of protocols; OCMA makes only the following assumptions about content: content is created, accessed, and distributed in units called documents ; documents are grouped in collections ; collections are hosted in local management systems called repositories . Finally, OCMA acknowledges that content management in gCube needs to: embrace heterogeneity , i.e. support simultaneously multiple locations, protocols, and models; hide heterogeneity , i.e. abstract over differences in location, protocol, and model; scale , i.e. retain good throughput under heavy load;
  • #15: OCMA is the Architecture, flagship CM sevice A storage back-end R may already offer a native T-interface. In this case OCMA relies on wrapper for R . A storage back-end R may offer a different T'-interface. In this case OCMA relies on adapter for R .
  • #22: Composition of: Data access to External infrastructures ( Aquamaps) Data process on Hadoop Data process on glite
  • #23: In statistics , signal processing , econometrics and mathematical finance , a time series is a sequence of data points , measured typically at successive times spaced at uniform time intervals. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.
  • #24: R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.