SlideShare a Scribd company logo
Sessions 43 & 44
   Accessing data using a common
interface: OGSA-DAI as an example
               Elias Theocharopoulos and Tilaye Alemu

   ISSGC ‘09 – Sophia Antipolis – Tuesday, 14th July 2009




        web: www.omii.ac.uk              email: info@omii.ac.uk
Overview
•   The problem: Sharing data in a grid
•   What is OGSA-DAI?
•   Data-centric workflows
•   Key OGSA-DAI terms
•   The OGSA-DAI client toolkit
•   Use cases and extensibility points
•   Pros and cons



                                                          2
           web: www.omii.ac.uk   email: info@omii.ac.uk
The problem:
Sharing and accessing
    data in a grid

                                                3
 web: www.omii.ac.uk   email: info@omii.ac.uk
Distributed data resources




       web: www.omii.ac.uk   email: info@omii.ac.uk
How about a central server?




                     FR               FR
                    query             data


                             Client



       web: www.omii.ac.uk                   email: info@omii.ac.uk
Central server pros and cons
•   Access to up-to-date data
•   Single point of access
•   Data in common format
•   Database can handle joins

• Initial overhead in terms of time, effort and cost
• Keeping data up to date
• Loss of control by data providers
    o   Assuming they even let go
• Security and trust



              web: www.omii.ac.uk   email: info@omii.ac.uk
How about providing direct access?



 UK       UK                 ES          ES                IA             IA
query     data              query        data             query          data




  Translate and join                Client



            web: www.omii.ac.uk                 email: info@omii.ac.uk
Direct access pros and cons
• Access to up-to-date data
• Fast access
• Data providers retain control

• Fat clients
• Heterogeneity and inconsistency
    o   Data
    o   Databases
    o   Connection
    o   Security
• Security overheads for data providers
    o   Manage firewalls and usernames/passwords for multiple clients
• Hard to use in grid/web service workflows




              web: www.omii.ac.uk                   email: info@omii.ac.uk
How about providing a ZIP on the web?



  UK data                         ES data                        IA data


HTTP        ZIP               HTTP          ZIP              HTTP          ZIP
 GET                           GET                            GET


  UnZIP, translate and join             Client




            web: www.omii.ac.uk                   email: info@omii.ac.uk
ZIP on the web pros and cons
• Fast access
• Data providers retain control

• Very large downloads even if client only needs
  subset
• Providers have to select and ZIP their data
• Client has to install data into a local database
• Static snapshot




          web: www.omii.ac.uk     email: info@omii.ac.uk
Sharing distributed heterogeneous resources with OGSA-DAI




 UK          UK                  ES              ES                   IA
query        data               query            data                query    IA
                                                                             data
        Translate and join           OGSA-DAI

                                 FR              FR
                                query            data



                                        Client




               web: www.omii.ac.uk                 email: info@omii.ac.uk
Motivation
• Grid is about sharing resources
• Need to share structured data resources
                                      Relational
                                      Database




                                        XML
                                      Database




                                       Indexed
                                         File


                                                       12
        web: www.omii.ac.uk   email: info@omii.ac.uk
What is OGSA-DAI?
• Open Grid Services Architecture Data Access
  Integration
• A framework that executes workflows
• Workflows are data-centric
• Workflow components are designed for data
  access, integration, transformation and
  delivery
• Can access heterogeneous data resources
• Webservice interface
• Intended as a toolkit for building higher-level
  application-specific data services

                                                            13
          web: www.omii.ac.uk      email: info@omii.ac.uk
OGSA-DAI’s vision
• Sharing data resources to enable collaboration
• Data access
   o   Structured data in distributed heterogeneous data resources
• Data integration
   o   e.g. expose multiple databases to users as a single virtual database
• Data transformation
   o   e.g. expose data in schema X to users as data in schema Y
• Data delivery
   o   To where it’s needed by the most appropriate means
   o   e.g. web service, e-mail, HTTP, FTP, GridFTP




             web: www.omii.ac.uk             email: info@omii.ac.uk
OGSA-DAI and data-centric workflows




   web: www.omii.ac.uk   email: info@omii.ac.uk
OGSA-DAI workflow
• Executes workflows

• Workflows contain activities
  o   Well-defined functional units
  o   Data goes in, something is done, data comes out
  o   Equivalent to programming language methods


• Workflows are submitted by clients
  o   To an OGSA-DAI web service



           web: www.omii.ac.uk      email: info@omii.ac.uk
An OGSA-DAI workflow - a simply analogy
                                     Countr        Capital
                                     y
                                     UK            London
                                     France        Paris                   Pays                 Capital

                                                                           Grande-Bretagne      Londres


                                                                           France               Paris
       Convert query                                         Convert
        from French          Run SQL query                  data from
          to English                                        English to
                           SELECT Country, Capital
                           FROM Countries                     French
                                                                                             Pays              Capita
                                                                            Join                               l
SELECT Pays,Capital                                                                          Grande-Bretagne   Londres
                                                                            the
    FROM Pays
                                                                            data             France            Paris
                            SELECT País, Capital
                            FROM Países
                                                             Convert                         l'Espagne         Madrid
      Convert query                                         data from
                                                                                             l'Italie          Rome
       from French           Run SQL query                  Spanish to
        to Spanish                                            French
                                                                           Pays        Capital
                                         País        Capital
                                                                           l'Espagne   Madrid
                                         España      Madrid
                                                                           l'Italie    Rome
                                         Italia      Roma



                      web: www.omii.ac.uk                           email: info@omii.ac.uk
How it appears to the client




                                 OGSA-DAI

workflow(SELECT Pays,Capital                  Pays                   Capital
        FROM Pays)
                                              Grande-Bretagne        Londres
                                              France                 Paris
                                   Client
                                              l'Espagne              Madrid
                                              l'Italie               Rome



           web: www.omii.ac.uk              email: info@omii.ac.uk
A query-transform-update example

                                      Countr        Capital
        Run SQL query                 y
                                      Spain         Madrid
                                      Italy         Rome

      Convert data from
      English to Spanish
                             País                       Capital
                             España                     Madrid
       Run SQL update
                             Italia                     Roma




       web: www.omii.ac.uk             email: info@omii.ac.uk
A query-transform-join example
                                                                                      FIELDS

                          Run SQL                    Read file                        País,Capital

                           query                                                      ENTRY=1
                                                                                      España,Madrid
 Country          Capita
                  l                          Convert data from                        ENTRY=2
 UK               London
                                              file to relational                      Italia,Roma
 France           Paris
                                                                               Pays             Capital

                                             Convert data from                 l'Espagne        Madrid
            Convert data from
             English to French               Spanish to French                 l'Italie         Rome

Pays                 Capita                                                    Pays             Capital
Grande-Bretagne      l
                     Londres                                                   l'Espagne        Madrid
France               Paris                  Join                               l'Italie         Rome

                                Pays               Capital
                                Grande-Bretagne    Londres
                                France             Paris
                                l'Espagne          Madrid
                                l'Italie
                      web: www.omii.ac.uk          Rome      email: info@omii.ac.uk
Data integration with OGSA-DAI workflows
• Across OGSA-DAI services

                                       OGSA                  DB1
                        Workflow 1
                                        DAI
                                             Data

                         Workflow 2
                                       OGSA                  DB2
                                        DAI


  SQLQuery       Deliver to           Receive from
    (DB1)        OGSA-DAI              OGSA-DAI

                                                            JOIN              Deliver
                                      SQLQuery
                                        (DB2)


                                                                                        21
             web: www.omii.ac.uk                     email: info@omii.ac.uk
Key OGSA-DAI terms:
 activities, resources,
       workflows

                                                22
 web: www.omii.ac.uk   email: info@omii.ac.uk
OGSA-DAI: Key Term Activity
• An activity is a named unit of functionality
   o   A well defined workflow unit
   o   Pluggable
   o   Composable
• An activity can have
   o   0 or more named inputs
   o   0 or more named outputs
• Blocks of data flow from an activity’s output into
  another activity’s input


                                                               23
            web: www.omii.ac.uk       email: info@omii.ac.uk
OGSA-DAI: Key Term Activity (cont.)
• Example activities include
  o   Execute an SQL query
  o   ZIP a batch of data
  o   List the files in a directory
  o   Execute an XSL transform on an XML document
  o   Deliver data to an FTP server




                                                          24
          web: www.omii.ac.uk    email: info@omii.ac.uk
OGSA-DAI: Key Term Activity (cont.)
• Activity Connections
  o   All required inputs must be connected
  o   All outputs must be connected
  o   Optional inputs


• Inputs
  o   Literal
  o   Streamed
  o   Types


                                                             25
           web: www.omii.ac.uk      email: info@omii.ac.uk
Connecting activities - examples




                                                       26
        web: www.omii.ac.uk   email: info@omii.ac.uk
Data grouping: Lists
• Special blocks are used to mark the beginning and the
  end of a list.
• A list groups related data as one unit.
       f1,f2                               [byte[]…],[ byte[]..]
                    ReadFromFileActivity
• For example ReadFromFileActivity can dynamically
  take any number of filenames as input.
   o   Without a way to group the output byte arrays we would
       have no way to differentiate between the binary data of
       filenames f1 and f2.
   o   Streaming is preserved since for each file a number of byte
       arrays is produced to be forwarded to coming activities.

                                                                     27
               web: www.omii.ac.uk          email: info@omii.ac.uk
Passing data internally: OGSA-DAI Tuple
• A special type of data passing between activities
• A Tuple is a data representation similar to a row
  of relational data. Each element of a Tuple
  represent a column.
• Tuples are normally grouped in lists and they are
  preceded by a metadata block.
                                      Athens 20
                                      Madrid 22
                                       Rome 25
                           SqlQuery
   SELECT city, temp
    FROM weather;

                                                               28
          web: www.omii.ac.uk         email: info@omii.ac.uk
An interesting activity: Tee
• There are activities that operate on the level of
  blocks and are not concerned with the type and
  values of data they are handling. E.g TeeActivity:

                                            [A,B,C,D]
 [A,B,C,D]
                        TeeActivity         [A,B,C,D]




             No of outputs: 2

                                                               29
         web: www.omii.ac.uk          email: info@omii.ac.uk
OGSA-DAI: Key Term Resource
• Data request execution resource
• Data resources
• Data sources
• Data sinks
• Sessions
   o A state container associated with a set of workflows

   o One workflow can lodge state

   o A subsequent workflow can retrieve it


• Requests
   o One per workflow submitted to a DRER

   o Access request status




                                                              30
           web: www.omii.ac.uk       email: info@omii.ac.uk
OGSA-DAI: Key Term Workflow
                      • A workflow can contain:
                         o Activities


                            • Resource-based: SQLQuery
                            • Non-Resource: Transformation
                              and Delivery
                         o Resources


                            • Targeted by Activities
                         o Other Workflows


                            • Sub workflows
                            • Other types of workflow



                                                             31
      web: www.omii.ac.uk          email: info@omii.ac.uk
OGSA-DAI: Key Term Workflow (cont’)
• OGSA-DAI can be used as a workflow
  processing system that is designed to stream
  data through a set of activities in a pipelined
  manner.
• In the Query->Transform->Deliver workflow, if
  the activities are well defined all three will be
  processing concurrently with different portions
  of the data stream.



                                                        32
         web: www.omii.ac.uk   email: info@omii.ac.uk
OGSA-DAI: Key Term Workflow (cont’)
• Pipeline workflow consists of a set of chained activities that will be
  executed in parallel with data flowing between the activities.
• Sequence workflow all the sub-workflows added to this workflow
  will be executed in sequence.
  For example 1st sub-workflow in a sequence creates a table, 2nd
  bulk loads transformed data into this table.
• Parallel workflow all the sub-workflows added to this workflow will
  be executed in parallel.

            1
            2




                                                                           33
             web: www.omii.ac.uk            email: info@omii.ac.uk
Getting to the first practical:
   The OGSA-DAI client
           toolkit.

                                                    34
     web: www.omii.ac.uk   email: info@omii.ac.uk
OGSA-DAI client toolkit
• OGSA-DAI client toolkit
  o   Construct and submit requests in Java not XML
       • Toolkit manages interaction with web services via SOAP
         over HTTP; it handles SOAP request construction and
         response parsing.
  o   Provides Java abstractions of
       •   Services
       •   OGSA-DAI resources and properties
       •   Requests
       •   Activities




                                                                  35
             web: www.omii.ac.uk       email: info@omii.ac.uk
The client toolkit
• The workflow description is sent to the OGSA-DAI
  server as an XML document.
• Application developer does not need to worry about
  creating this document.
• The client toolkit provides ways of assembling activity
  workflows programmatically.

• We will see how to use the client toolkit during the
  hands-on session.



                                                             36
          web: www.omii.ac.uk       email: info@omii.ac.uk
Service/resource model

                                                            One
                                                               Data
                                                                              Data
                                                             Resource
                               MyDRER
            Data                                            Two
           Request              Data Request                   Data
                                 Execution                                    Data
          Execution                                          Resource
           Service               Resource
                                                            Three
                                                                Data
                                                                              Data
                                                             Resource


Client


                                                               Session
                                                                Session
           Request                                               Request
          Management
                                                            MyRequest123456
            Service



                                                                                 37
         web: www.omii.ac.uk                   email: info@omii.ac.uk
Client Toolkit Activities
• One client activity per server activity
• Same input and output names
• Plus some convenience methods
For example:
• Retrieve results as a JDBC ResultSet from a
  TupleToWebRowSet activity.
• Retrieve update count as an Integer from a
  SQLUpdate activity


                                                        38
         web: www.omii.ac.uk   email: info@omii.ac.uk
Step by Step Guide for Writing Clients
• Create activities
   o   There’s a corresponding client toolkit activity for each
       server-side activity


       DeliverToFTP deliver = new DeliverToFTP();
       ReadFromFile readFile = new ReadFromFile();




                                                                  39
             web: www.omii.ac.uk        email: info@omii.ac.uk
Connecting activities
• Set inputs for each activity (e.g. parameters)
• Every input parameter can either be literal input
  or streamed from another activity
  o   Literal inputs, e.g. for constant parameters:

      deliver.addFilename("results1.txt");
      deliver.addHost(“anonymous@test.ogsadai.org.uk:21");

  o   Connect input to the output of another activity to
      stream data

      deliver.connectDataInput(readFile.getDataOutput());

                                                                  40
            web: www.omii.ac.uk          email: info@omii.ac.uk
Gaining access to the results
• If the output of an activity can be provided in a
  user-friendly type, then there are methods to
  access the results:
   o   Check whether there are more results to be retrieved

       boolean hasNext = sqlUpdate.hasNextResult();


   o   Get the next result in a convenient type

       int count = sqlUpdate.getNextResult();


                                                                  41
            web: www.omii.ac.uk          email: info@omii.ac.uk
Build and execute the Workflow Request
• Create workflow and add activities to them
• A data service executes the workflow and
  returns a response (or an error!)
• The response may contain data (depending on
  the activities)
• Each client toolkit activity provides utility
  methods for retrieving its response data




                                                        42
         web: www.omii.ac.uk   email: info@omii.ac.uk
First hands-on session




                                Go to :
http://homepages.nesc.ac.uk/~elias/issgc09/html/practical.html




                                                                   43
          web: www.omii.ac.uk             email: info@omii.ac.uk
Extensibility points &
    components


                                               44
web: www.omii.ac.uk   email: info@omii.ac.uk
Extending OGSA-DAI: What

• OGSA-DAI
  o   A Framework
  o   Extensible
• Out of the Box is the basics
  o   Different applications have different needs
  o   New Sources of Data
  o   New Functionality




                                                              45
           web: www.omii.ac.uk       email: info@omii.ac.uk
Extending OGSA-DAI: Overview
                                                                                                            Presentation Layer


OMII                    GT     Axis New Message Frameworks
                                      UNICORE WS-DAI                                         ?     gLite        Embedded

                                                                                                            OGSA-DAI Core

                 Workflow Execution Engine                                         Persistence and Configuration

                         Activity Framework                                        Sessions
           XPathQuery




                                                    XSLTransform



                                                                                   Request
SQLQuery




                                   DeliverToURL




                                                                   MyOwnActivity


                                                                                   Data Source
                                                                                      New Functionality

                                                                                   Data Sink

Data Resources                                    New Types of Data

                                                                                                                           46
                             web: www.omii.ac.uk                                       email: info@omii.ac.uk
Extending OGSA-DAI: Activities


• Activities do some unit of work
• Specific transformation
  o   Data Format: SWISS-PROT to format X
• Delivery
  o   Deliver to a target service
• Data analysis and Integration
  o   Combine data from different sources



                                                             47
           web: www.omii.ac.uk      email: info@omii.ac.uk
Extending OGSA-DAI: Resources
• New resources – why?
  o   New Products
  o   New Applications
  o   Specialised Access


• Required:
  o   DataResource
  o   DataResourceState
  o   ResourceAccessor


                                                          48
           web: www.omii.ac.uk   email: info@omii.ac.uk
Extending OGSA-DAI: Remote Resource




• Accessing Resources on Remote OGSA-DAI
• Avoid replication of resources
• Security Issues
  o   Devolved to Local OGSA-DAI
  o   Security between OGSA-DAI Deployments



                                                         49
          web: www.omii.ac.uk   email: info@omii.ac.uk
SQL views
• Define a drPatient view
    o   SELECT id, name, age, sex, doctor.name as drName FROM patient,
        doctor              WHERE patient.DrID = doctor.ID;
                                                  ID        Name           DN
ID Name       Age      Sex ZIP            Dr ID
                                                  123       Greene         US-Chicago-G
1   Ken       42       M      IL1478305   456
                                                  456       Ross           US-Chicago-R
2   Josie     25       F      BN1 7QP     789
                                                  789       Fairhead       UK-Holby-F
• Client runs SELECT * FROM drPatient;
• Shorthand for complex query results
• Data access control e.g. users of drPatient
    o   Cannot access a patient’s ZIP
    o   Are unaware of the doctor or patient tables


                   web: www.omii.ac.uk            email: info@omii.ac.uk
OGSA-DAI SQL views
• OGSA-DAI SQL views data resource
  o   Represents a view across a database exposed by an
      OGSA-DAI relational resource
• SQLQuery activity
  o   Parses query
  o   Splices in view definition
  o   Submits transformed query to database
• Can define views for read-only databases
• Schema transformation
  o   Map a logical schema to a physical schema



           web: www.omii.ac.uk      email: info@omii.ac.uk
OGSA-DAI SQL views and security
• Factor in client’s security credentials
• e.g. define drPatient view as
   o   SELECT patients.* FROM patients, doctor WHERE
       patients.DrID = doctor.ID AND d.dn = $DN$;
• Replace $DN$ by client’s DN provided by Grid
  security components
• Doctors can only view their own patients




           web: www.omii.ac.uk    email: info@omii.ac.uk
Distributed query processing
• OGSA-DQP
   o   Developed by Universities of Manchester and Newcastle
   o   Refactored for OGSA-DAI 3.0 by EPCC as part of the NextGrid project
   o   OGSA-DAI DQP package
• Multiple tables on multiple databases are exposed to clients as
  multiple tables in one “virtual database”
• Clients are unaware of the multiple databases
• Databases can be exposed
   o   EITHER within one OGSA-DAI server
   o   OR via multiple remote OGSA-DAI servers




               web: www.omii.ac.uk           email: info@omii.ac.uk
OGSA-DAI DQP
                                                             3b: SELECT
                                                      Annotations_Ratings.ID,
      3a: SELECT
                                                   Annotations_Ratings.Confidence
    Archeo_Finds.ID,
                                                     FROM Annotation_Ratings
Archeo_Finds.Provenance
                                                               WHERE
  FROM Archeo_Finds;
                          OGSA-DAI      OGSA-DAI   Annotations_Ratings.Confidence
                                                                > 0.99


 3: Execute                                    4: Push results
sub-queries   OGSA-DAI (DQP query evaluator)                        5: Combine and post-
                                                                    process – do the JOIN
                   2: Parse query and              OGSA-DAI (core + DQP coordinator)
                    form query plan

                                                                             5: Results
   1: SELECT Archeo_Finds.ID, Archeo_Finds.Provenance,
 Annotations_Ratings.Confidence FROM Annotations_Ratings,
                                                                    Client
  HGV_June WHERE Annotations_Ratings.Confidence > 0.99
       AND Annotations_Ratings.ID = Archeo_Finds.ID;


                web: www.omii.ac.uk                email: info@omii.ac.uk
OGSA-DAI workflows – a de-facto standard
• OGSA-DAI workflows are a de-facto standard
   o   Of use to many projects as we’ll see
• For some applications workflows are too powerful
   o   Too expressive
   o   Infer semantics from names of activities available on server
        • Must interrogate the server
   o   Problems using OGSA-DAI services in workflow engines e.g.
       Taverna
   o   Not compatible with existing data analysis tools




             web: www.omii.ac.uk              email: info@omii.ac.uk
Facades
• Define facades on top of OGSA-DAI
• Why?
   o   Provide interfaces with more tightly-defined semantics
   o   Comply with standards
   o   Exploit existing data analysis tools
• Continue to exploit the power of workflows under-the-
  hood
   o   “Canned workflows”
   o   Templates selected and populated, executed and parsed
   o   Map service operations to “template” OGSA-DAI workflows




             web: www.omii.ac.uk            email: info@omii.ac.uk
Grid-enabling existing data-related products




                                OGSA-DAI



                         OGSA-DAI mediator



                           Data analysis tool




         web: www.omii.ac.uk             email: info@omii.ac.uk
OGSA-DAI in action




web: www.omii.ac.uk    email: info@omii.ac.uk
VOTES – data with different schema distributed across multiple
databases within a group of strategic partners

• Virtual Organisations for Trials and Epidemiological
  Studies (VOTES)
   o   http://labserv.nesc.gla.ac.uk/projects/votes/index.html
   o   UK Medical Research Council project
• Data access and integration in the clinical domain
   o   Relational databases – Microsoft SQL Server, Access, …
   o   Distributed database joins
        • Patient information
        • Clinical trials records
   o   Linking key is Scotland’s CHI number




             web: www.omii.ac.uk              email: info@omii.ac.uk
VOTES – cross-database join activity
                                    workflow         OGSA            DB1
                                                      DAI
                                                                     DB2
           SELECT CHI, Sex, DOB
           FROM Patients
           ORDER BY CHI

                        SQLQuery        (CHI, Sex, DOB)
                          (DB1)                                     (CHI, Sex, DOB, Diagnosis)
                                        Ordered data        Merge
                                          streams                                  Deliver
                                                             Join
                        SQLQuery
                          (DB2)        (CHI, Diagnosis)
            SELECT CHI, Diagnosis
            FROM TrialX
            ORDER BY CHI


• This is equivalent to running:
    SELECT chi, sex, DOB, diagnosis FROM patients, trialX WHERE
                            patients.chi = trialX.chi;
• patients and trialX are in two different databases



               web: www.omii.ac.uk                            email: info@omii.ac.uk
Public Health Grid – data with different schema distributed across multiple
databases within a group of strategic partners

 • US Public Health Grid
     o   US Centers for Disease Control
     o   University of Pittsburgh
     o   Tarrant Country Public Health Department
     o   Dallas County Public Health Department
 • Real-time Outbreak and Disease Surveillance
     o   Health query system
     o   Look for incidences of some disease on the rise over an area
     o   Historical and live data
 • Health centres maintain their own databases
     o   Distributed databases
     o   Different products and schemas
          • e.g. PatientID, Id, PatientIdentifier, PatientNumber
     o   Security and privacy is important



               web: www.omii.ac.uk              email: info@omii.ac.uk
Public Health Grid – workflows, DQP and views


                                                                         DB1
 workflow    OGSA
             OGSA-                       OGSA-
              DAI               DB6       View         DB5
              DAI                         DAI                            DB2


                                                   OGSA-
                                                                                      OGSA-
                                                    DQP                  DB4           View    DB3
                                                                                       DAI
SELECT zip, count(*) as total
FROM Cases
WHERE Reason = “Flu”                                         Cases:
GROUP BY zip                                                 SELECT * FROM
ORDER BY zip                                                 DB1.Cases UNION DB2.Cases UNION
                                          (15112, 3)         DB4.Cases
                            SQLQuery      (15144, 1)
                              (DB6)




                   web: www.omii.ac.uk                       email: info@omii.ac.uk
SEE-GEO – working with private and public data

• SEcurE access to GEOspatial services
   o   http://edina.ac.uk/projects/seesaw/seegeo/index.html
   o   EDINA, MIMAS, NeSC, NCeSS
   o   UK JISC project
• Geographical information systems
• Virtual integration of and access control to
   o   Census data – geo-data access service
   o   Borders data – web feature service
   o   Data hosted by other organisations and exposed as
       services



            web: www.omii.ac.uk       email: info@omii.ac.uk
SEE-GEO – geo-linking service portal

  1: GLSQuery
  submited via
portal e.g. “Leeds
   population             GLS
                                                                                      Maps
 distribution by         Portal
 census output                                    5: Portal gets image using URL
       area”

                                  4: URL of image is returned to portal – avoids
                                      costly SOAP/HTTP transfer of image

       MIMAS                                                            OGSA-DAI
       Census              Get
                                                                                     3: Image
                                           Join        Transform         Deliver
                                                                                     is placed
                           Get                                                       on a map
       UK                                                                              server

   BORDERS           2: Workflow is populated with
                       query parameters and run           Image
                                                         Creation
                                                         Service
                     web: www.omii.ac.uk                    email: info@omii.ac.uk
What did OGSA-DAI give SEE-GEO?
• Could implement GLS service without OGSA-
  DAI
• But using OGSA-DAI allowed leverage of
  o   Workflow engine
  o   Out-of-the-box activities for
       • Queries
       • Delivery
  o   Security
  o   Other grid technologies, e.g. GridFTP



                                                               65
           web: www.omii.ac.uk        email: info@omii.ac.uk
What did OGSA-DAI give SEE-GEO?
• A toolkit to
   o Develop domain-specific activities

   o Develop support for domain-specific data resources

   o Ability to execute workflows using these

   o Build OGC Web Processing Services (WPS)


• Relatively little effort to
   o Choose different data resources dynamically

   o Merge GDAS XML into a relational data resource

   o Transfer data using GridFTP

   o Protect data using GSI

   o Experiment!



                                                             66
          web: www.omii.ac.uk       email: info@omii.ac.uk
Why OGSA-DAI?




web: www.omii.ac.uk    email: info@omii.ac.uk
Workflows
• A workflow can represent a complex data
  management scenario, involving:
  o   Data access
  o   Transformation
  o   Filtering
  o   Updating
  o   Numerous distributed, heterogeneous databases




          web: www.omii.ac.uk      email: info@omii.ac.uk
Workflows and performance
• OGSA-DAI is one more layer between clients
  and data
• Therefore, OGSA-DAI is not as fast as a direct
  connection to a database
  o   OGSA-DAI uses JDBC so will never be as fast as a
      direct JDBC connection
• But this is not what OGSA-DAI is designed to
  do




           web: www.omii.ac.uk     email: info@omii.ac.uk
Workflows and performance
• Having a server execute workflows yields
   o   Thinner clients with less memory and CPU requirements
   o   Minimised client-server communication overheads
• Activities process data on the server
   o   Minimises data movement
   o   As opposed to BPEL or Taverna or web service-based workflow
       engines which pass data to and fro via web services
• Data streaming
   o   Activities work on different parts of the data stream in parallel
   o   Reduces memory footprint on server
   o   Reduces execution time



             web: www.omii.ac.uk             email: info@omii.ac.uk
Workflows and inter-operability
• A workflow is a simple way of representing a
  complex set of related, ordered actions
  o   A de-facto standard
  o   Very expressive
• How to standardise and promote inter-
  operability?
  o   Use a facade and exploit workflows behind well-
      defined interface
  o   Facilitate inter-operability with other data products




           web: www.omii.ac.uk        email: info@omii.ac.uk
Why another layer can be good
• Data providers retain control of their data
• A place to hide database heterogeneities
  o   Yields thinner clients
• A place to enforce additional security
  o   Hide the actual location of the data
  o   Filter the data according to the rights of clients
  o   Manage access to federations, databases, tables,
      documents, files, rows, lines
• A place to define views on read-only databases


           web: www.omii.ac.uk         email: info@omii.ac.uk
Developing applications
• OGSA-DAI is highly extensible
  o   Data resources, activities, security, presentation layers


• An enabling framework
  o   Save development time
  o   Focus on application-specific features
  o   Get standard functionalities out-of-the-box
       • Queries, updates, transformations, deliveries




            web: www.omii.ac.uk             email: info@omii.ac.uk
Portability
• OGSA-DAI is 100% Java
  o   Runs under Windows, UNIX, Linux


• OGSA-DAI uses web services
  o   Clients can be written in any language and on any
      platform that supports web services




           web: www.omii.ac.uk       email: info@omii.ac.uk
Accessibility
• 100% Java open source freeware
• Compliant with free open source web and grid
  products
  o   Globus Toolkit 4.0.x
  o   Apache Axis/Tomcat
  o   OMII 3.4.0
  o   UNICORE – by OMII-Europe
  o   VOMS – by OMII-Europe




          web: www.omii.ac.uk    email: info@omii.ac.uk
Second and third hands-on sessions



                                Go to :
http://homepages.nesc.ac.uk/~elias/issgc09/html/practical.html
              #ScenarioTwoDataIntegration




                                                                   76
          web: www.omii.ac.uk             email: info@omii.ac.uk
Further information


• WWW site          : http://www.ogsadai.org.uk
• Info              : info@ogsadai.org.uk
• Users e-mail list : users@ogsadai.org.uk




          web: www.omii.ac.uk       email: info@omii.ac.uk

More Related Content

PDF
7. the grid ogsa
PDF
Cs6703 grid and cloud computing unit 2 questions
PPTX
Grid examples lab exercise
PPTX
Gestalt examples
PPT
Open Grid Service Architecture By Gargishankar Verma - RCET Bhilai
PDF
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
PPTX
Linked Open Government Data in UK
PDF
Linking UK Government Data, John Sheridan
7. the grid ogsa
Cs6703 grid and cloud computing unit 2 questions
Grid examples lab exercise
Gestalt examples
Open Grid Service Architecture By Gargishankar Verma - RCET Bhilai
Open Data Initiatives – Empowering Students to Make More Informed Choices? - ...
Linked Open Government Data in UK
Linking UK Government Data, John Sheridan

Similar to Session 43 :: Accessing data using a common interface: OGSA-DAI as an example (20)

PPTX
Patstat and patstat related resources for patent data analisys
PDF
Olap scalability
PDF
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
PPT
Linked Data as a new environment for Learning Analytics and education
PDF
Databases and web mapping the Open Source way
PPTX
PPT
Geoservices Activities at EDINA
PDF
Visualisation and linked data applications edf 2013
PPT
Exploring the Semantic Web
DOCX
BUS105Business Information SystemsWorkshop Week 3.docx
PPTX
Sharing data
PPT
CRS Project
PPT
Using OGC Standards To Link BI and Spatial
PPTX
Low Latency “OLAP” with HBase - HBaseCon 2012
PPT
Open Data - Opportunities for Researchers and Developers
PDF
Open Data Conference - Paul Davidson - Standards in UK & Progress
PDF
Worskhop Leicester 2010
PPTX
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
PPTX
MapInfo Professional 12.0 and SQL Server 2008
PPTX
The Vocbench Project
Patstat and patstat related resources for patent data analisys
Olap scalability
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Linked Data as a new environment for Learning Analytics and education
Databases and web mapping the Open Source way
Geoservices Activities at EDINA
Visualisation and linked data applications edf 2013
Exploring the Semantic Web
BUS105Business Information SystemsWorkshop Week 3.docx
Sharing data
CRS Project
Using OGC Standards To Link BI and Spatial
Low Latency “OLAP” with HBase - HBaseCon 2012
Open Data - Opportunities for Researchers and Developers
Open Data Conference - Paul Davidson - Standards in UK & Progress
Worskhop Leicester 2010
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
MapInfo Professional 12.0 and SQL Server 2008
The Vocbench Project
Ad

More from ISSGC Summer School (20)

PDF
Session 58 - Cloud computing, virtualisation and the future
PDF
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
PPTX
Session 50 - High Performance Computing Ecosystem in Europe
PPT
Integrating Practical2009
PPT
Session 49 Practical Semantic Sticky Note
PDF
PPT
Session 48 - Principles of Semantic metadata management
PPT
Session 49 - Semantic metadata management practical
PPT
Session 46 - Principles of workflow management and execution
PPT
Session 42 - GridSAM
PPT
Session 37 - Intro to Workflows, API's and semantics
PDF
Session 40 : SAGA Overview and Introduction
PPT
Session 36 - Engage Results
PDF
Session 23 - Intro to EGEE-III
PPTX
Session 33 - Production Grids
PDF
Social Program
PPT
Session29 Arc
PDF
Session 24 - Distribute Data and Metadata Management with gLite
PDF
Session 23 - gLite Overview
PPTX
General Introduction to technologies that will be seen in the school
Session 58 - Cloud computing, virtualisation and the future
Session 58 :: Cloud computing, virtualisation and the future Speaker: Ake Edlund
Session 50 - High Performance Computing Ecosystem in Europe
Integrating Practical2009
Session 49 Practical Semantic Sticky Note
Session 48 - Principles of Semantic metadata management
Session 49 - Semantic metadata management practical
Session 46 - Principles of workflow management and execution
Session 42 - GridSAM
Session 37 - Intro to Workflows, API's and semantics
Session 40 : SAGA Overview and Introduction
Session 36 - Engage Results
Session 23 - Intro to EGEE-III
Session 33 - Production Grids
Social Program
Session29 Arc
Session 24 - Distribute Data and Metadata Management with gLite
Session 23 - gLite Overview
General Introduction to technologies that will be seen in the school
Ad

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
Reach Out and Touch Someone: Haptics and Empathic Computing
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
GamePlan Trading System Review: Professional Trader's Honest Take
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Network Security Unit 5.pdf for BCA BBA.
Unlocking AI with Model Context Protocol (MCP)
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Session 43 :: Accessing data using a common interface: OGSA-DAI as an example

  • 1. Sessions 43 & 44 Accessing data using a common interface: OGSA-DAI as an example Elias Theocharopoulos and Tilaye Alemu ISSGC ‘09 – Sophia Antipolis – Tuesday, 14th July 2009 web: www.omii.ac.uk email: info@omii.ac.uk
  • 2. Overview • The problem: Sharing data in a grid • What is OGSA-DAI? • Data-centric workflows • Key OGSA-DAI terms • The OGSA-DAI client toolkit • Use cases and extensibility points • Pros and cons 2 web: www.omii.ac.uk email: info@omii.ac.uk
  • 3. The problem: Sharing and accessing data in a grid 3 web: www.omii.ac.uk email: info@omii.ac.uk
  • 4. Distributed data resources web: www.omii.ac.uk email: info@omii.ac.uk
  • 5. How about a central server? FR FR query data Client web: www.omii.ac.uk email: info@omii.ac.uk
  • 6. Central server pros and cons • Access to up-to-date data • Single point of access • Data in common format • Database can handle joins • Initial overhead in terms of time, effort and cost • Keeping data up to date • Loss of control by data providers o Assuming they even let go • Security and trust web: www.omii.ac.uk email: info@omii.ac.uk
  • 7. How about providing direct access? UK UK ES ES IA IA query data query data query data Translate and join Client web: www.omii.ac.uk email: info@omii.ac.uk
  • 8. Direct access pros and cons • Access to up-to-date data • Fast access • Data providers retain control • Fat clients • Heterogeneity and inconsistency o Data o Databases o Connection o Security • Security overheads for data providers o Manage firewalls and usernames/passwords for multiple clients • Hard to use in grid/web service workflows web: www.omii.ac.uk email: info@omii.ac.uk
  • 9. How about providing a ZIP on the web? UK data ES data IA data HTTP ZIP HTTP ZIP HTTP ZIP GET GET GET UnZIP, translate and join Client web: www.omii.ac.uk email: info@omii.ac.uk
  • 10. ZIP on the web pros and cons • Fast access • Data providers retain control • Very large downloads even if client only needs subset • Providers have to select and ZIP their data • Client has to install data into a local database • Static snapshot web: www.omii.ac.uk email: info@omii.ac.uk
  • 11. Sharing distributed heterogeneous resources with OGSA-DAI UK UK ES ES IA query data query data query IA data Translate and join OGSA-DAI FR FR query data Client web: www.omii.ac.uk email: info@omii.ac.uk
  • 12. Motivation • Grid is about sharing resources • Need to share structured data resources Relational Database XML Database Indexed File 12 web: www.omii.ac.uk email: info@omii.ac.uk
  • 13. What is OGSA-DAI? • Open Grid Services Architecture Data Access Integration • A framework that executes workflows • Workflows are data-centric • Workflow components are designed for data access, integration, transformation and delivery • Can access heterogeneous data resources • Webservice interface • Intended as a toolkit for building higher-level application-specific data services 13 web: www.omii.ac.uk email: info@omii.ac.uk
  • 14. OGSA-DAI’s vision • Sharing data resources to enable collaboration • Data access o Structured data in distributed heterogeneous data resources • Data integration o e.g. expose multiple databases to users as a single virtual database • Data transformation o e.g. expose data in schema X to users as data in schema Y • Data delivery o To where it’s needed by the most appropriate means o e.g. web service, e-mail, HTTP, FTP, GridFTP web: www.omii.ac.uk email: info@omii.ac.uk
  • 15. OGSA-DAI and data-centric workflows web: www.omii.ac.uk email: info@omii.ac.uk
  • 16. OGSA-DAI workflow • Executes workflows • Workflows contain activities o Well-defined functional units o Data goes in, something is done, data comes out o Equivalent to programming language methods • Workflows are submitted by clients o To an OGSA-DAI web service web: www.omii.ac.uk email: info@omii.ac.uk
  • 17. An OGSA-DAI workflow - a simply analogy Countr Capital y UK London France Paris Pays Capital Grande-Bretagne Londres France Paris Convert query Convert from French Run SQL query data from to English English to SELECT Country, Capital FROM Countries French Pays Capita Join l SELECT Pays,Capital Grande-Bretagne Londres the FROM Pays data France Paris SELECT País, Capital FROM Países Convert l'Espagne Madrid Convert query data from l'Italie Rome from French Run SQL query Spanish to to Spanish French Pays Capital País Capital l'Espagne Madrid España Madrid l'Italie Rome Italia Roma web: www.omii.ac.uk email: info@omii.ac.uk
  • 18. How it appears to the client OGSA-DAI workflow(SELECT Pays,Capital Pays Capital FROM Pays) Grande-Bretagne Londres France Paris Client l'Espagne Madrid l'Italie Rome web: www.omii.ac.uk email: info@omii.ac.uk
  • 19. A query-transform-update example Countr Capital Run SQL query y Spain Madrid Italy Rome Convert data from English to Spanish País Capital España Madrid Run SQL update Italia Roma web: www.omii.ac.uk email: info@omii.ac.uk
  • 20. A query-transform-join example FIELDS Run SQL Read file País,Capital query ENTRY=1 España,Madrid Country Capita l Convert data from ENTRY=2 UK London file to relational Italia,Roma France Paris Pays Capital Convert data from l'Espagne Madrid Convert data from English to French Spanish to French l'Italie Rome Pays Capita Pays Capital Grande-Bretagne l Londres l'Espagne Madrid France Paris Join l'Italie Rome Pays Capital Grande-Bretagne Londres France Paris l'Espagne Madrid l'Italie web: www.omii.ac.uk Rome email: info@omii.ac.uk
  • 21. Data integration with OGSA-DAI workflows • Across OGSA-DAI services OGSA DB1 Workflow 1 DAI Data Workflow 2 OGSA DB2 DAI SQLQuery Deliver to Receive from (DB1) OGSA-DAI OGSA-DAI JOIN Deliver SQLQuery (DB2) 21 web: www.omii.ac.uk email: info@omii.ac.uk
  • 22. Key OGSA-DAI terms: activities, resources, workflows 22 web: www.omii.ac.uk email: info@omii.ac.uk
  • 23. OGSA-DAI: Key Term Activity • An activity is a named unit of functionality o A well defined workflow unit o Pluggable o Composable • An activity can have o 0 or more named inputs o 0 or more named outputs • Blocks of data flow from an activity’s output into another activity’s input 23 web: www.omii.ac.uk email: info@omii.ac.uk
  • 24. OGSA-DAI: Key Term Activity (cont.) • Example activities include o Execute an SQL query o ZIP a batch of data o List the files in a directory o Execute an XSL transform on an XML document o Deliver data to an FTP server 24 web: www.omii.ac.uk email: info@omii.ac.uk
  • 25. OGSA-DAI: Key Term Activity (cont.) • Activity Connections o All required inputs must be connected o All outputs must be connected o Optional inputs • Inputs o Literal o Streamed o Types 25 web: www.omii.ac.uk email: info@omii.ac.uk
  • 26. Connecting activities - examples 26 web: www.omii.ac.uk email: info@omii.ac.uk
  • 27. Data grouping: Lists • Special blocks are used to mark the beginning and the end of a list. • A list groups related data as one unit. f1,f2 [byte[]…],[ byte[]..] ReadFromFileActivity • For example ReadFromFileActivity can dynamically take any number of filenames as input. o Without a way to group the output byte arrays we would have no way to differentiate between the binary data of filenames f1 and f2. o Streaming is preserved since for each file a number of byte arrays is produced to be forwarded to coming activities. 27 web: www.omii.ac.uk email: info@omii.ac.uk
  • 28. Passing data internally: OGSA-DAI Tuple • A special type of data passing between activities • A Tuple is a data representation similar to a row of relational data. Each element of a Tuple represent a column. • Tuples are normally grouped in lists and they are preceded by a metadata block. Athens 20 Madrid 22 Rome 25 SqlQuery SELECT city, temp FROM weather; 28 web: www.omii.ac.uk email: info@omii.ac.uk
  • 29. An interesting activity: Tee • There are activities that operate on the level of blocks and are not concerned with the type and values of data they are handling. E.g TeeActivity: [A,B,C,D] [A,B,C,D] TeeActivity [A,B,C,D] No of outputs: 2 29 web: www.omii.ac.uk email: info@omii.ac.uk
  • 30. OGSA-DAI: Key Term Resource • Data request execution resource • Data resources • Data sources • Data sinks • Sessions o A state container associated with a set of workflows o One workflow can lodge state o A subsequent workflow can retrieve it • Requests o One per workflow submitted to a DRER o Access request status 30 web: www.omii.ac.uk email: info@omii.ac.uk
  • 31. OGSA-DAI: Key Term Workflow • A workflow can contain: o Activities • Resource-based: SQLQuery • Non-Resource: Transformation and Delivery o Resources • Targeted by Activities o Other Workflows • Sub workflows • Other types of workflow 31 web: www.omii.ac.uk email: info@omii.ac.uk
  • 32. OGSA-DAI: Key Term Workflow (cont’) • OGSA-DAI can be used as a workflow processing system that is designed to stream data through a set of activities in a pipelined manner. • In the Query->Transform->Deliver workflow, if the activities are well defined all three will be processing concurrently with different portions of the data stream. 32 web: www.omii.ac.uk email: info@omii.ac.uk
  • 33. OGSA-DAI: Key Term Workflow (cont’) • Pipeline workflow consists of a set of chained activities that will be executed in parallel with data flowing between the activities. • Sequence workflow all the sub-workflows added to this workflow will be executed in sequence. For example 1st sub-workflow in a sequence creates a table, 2nd bulk loads transformed data into this table. • Parallel workflow all the sub-workflows added to this workflow will be executed in parallel. 1 2 33 web: www.omii.ac.uk email: info@omii.ac.uk
  • 34. Getting to the first practical: The OGSA-DAI client toolkit. 34 web: www.omii.ac.uk email: info@omii.ac.uk
  • 35. OGSA-DAI client toolkit • OGSA-DAI client toolkit o Construct and submit requests in Java not XML • Toolkit manages interaction with web services via SOAP over HTTP; it handles SOAP request construction and response parsing. o Provides Java abstractions of • Services • OGSA-DAI resources and properties • Requests • Activities 35 web: www.omii.ac.uk email: info@omii.ac.uk
  • 36. The client toolkit • The workflow description is sent to the OGSA-DAI server as an XML document. • Application developer does not need to worry about creating this document. • The client toolkit provides ways of assembling activity workflows programmatically. • We will see how to use the client toolkit during the hands-on session. 36 web: www.omii.ac.uk email: info@omii.ac.uk
  • 37. Service/resource model One Data Data Resource MyDRER Data Two Request Data Request Data Execution Data Execution Resource Service Resource Three Data Data Resource Client Session Session Request Request Management MyRequest123456 Service 37 web: www.omii.ac.uk email: info@omii.ac.uk
  • 38. Client Toolkit Activities • One client activity per server activity • Same input and output names • Plus some convenience methods For example: • Retrieve results as a JDBC ResultSet from a TupleToWebRowSet activity. • Retrieve update count as an Integer from a SQLUpdate activity 38 web: www.omii.ac.uk email: info@omii.ac.uk
  • 39. Step by Step Guide for Writing Clients • Create activities o There’s a corresponding client toolkit activity for each server-side activity DeliverToFTP deliver = new DeliverToFTP(); ReadFromFile readFile = new ReadFromFile(); 39 web: www.omii.ac.uk email: info@omii.ac.uk
  • 40. Connecting activities • Set inputs for each activity (e.g. parameters) • Every input parameter can either be literal input or streamed from another activity o Literal inputs, e.g. for constant parameters: deliver.addFilename("results1.txt"); deliver.addHost(“anonymous@test.ogsadai.org.uk:21"); o Connect input to the output of another activity to stream data deliver.connectDataInput(readFile.getDataOutput()); 40 web: www.omii.ac.uk email: info@omii.ac.uk
  • 41. Gaining access to the results • If the output of an activity can be provided in a user-friendly type, then there are methods to access the results: o Check whether there are more results to be retrieved boolean hasNext = sqlUpdate.hasNextResult(); o Get the next result in a convenient type int count = sqlUpdate.getNextResult(); 41 web: www.omii.ac.uk email: info@omii.ac.uk
  • 42. Build and execute the Workflow Request • Create workflow and add activities to them • A data service executes the workflow and returns a response (or an error!) • The response may contain data (depending on the activities) • Each client toolkit activity provides utility methods for retrieving its response data 42 web: www.omii.ac.uk email: info@omii.ac.uk
  • 43. First hands-on session Go to : http://homepages.nesc.ac.uk/~elias/issgc09/html/practical.html 43 web: www.omii.ac.uk email: info@omii.ac.uk
  • 44. Extensibility points & components 44 web: www.omii.ac.uk email: info@omii.ac.uk
  • 45. Extending OGSA-DAI: What • OGSA-DAI o A Framework o Extensible • Out of the Box is the basics o Different applications have different needs o New Sources of Data o New Functionality 45 web: www.omii.ac.uk email: info@omii.ac.uk
  • 46. Extending OGSA-DAI: Overview Presentation Layer OMII GT Axis New Message Frameworks UNICORE WS-DAI ? gLite Embedded OGSA-DAI Core Workflow Execution Engine Persistence and Configuration Activity Framework Sessions XPathQuery XSLTransform Request SQLQuery DeliverToURL MyOwnActivity Data Source New Functionality Data Sink Data Resources New Types of Data 46 web: www.omii.ac.uk email: info@omii.ac.uk
  • 47. Extending OGSA-DAI: Activities • Activities do some unit of work • Specific transformation o Data Format: SWISS-PROT to format X • Delivery o Deliver to a target service • Data analysis and Integration o Combine data from different sources 47 web: www.omii.ac.uk email: info@omii.ac.uk
  • 48. Extending OGSA-DAI: Resources • New resources – why? o New Products o New Applications o Specialised Access • Required: o DataResource o DataResourceState o ResourceAccessor 48 web: www.omii.ac.uk email: info@omii.ac.uk
  • 49. Extending OGSA-DAI: Remote Resource • Accessing Resources on Remote OGSA-DAI • Avoid replication of resources • Security Issues o Devolved to Local OGSA-DAI o Security between OGSA-DAI Deployments 49 web: www.omii.ac.uk email: info@omii.ac.uk
  • 50. SQL views • Define a drPatient view o SELECT id, name, age, sex, doctor.name as drName FROM patient, doctor WHERE patient.DrID = doctor.ID; ID Name DN ID Name Age Sex ZIP Dr ID 123 Greene US-Chicago-G 1 Ken 42 M IL1478305 456 456 Ross US-Chicago-R 2 Josie 25 F BN1 7QP 789 789 Fairhead UK-Holby-F • Client runs SELECT * FROM drPatient; • Shorthand for complex query results • Data access control e.g. users of drPatient o Cannot access a patient’s ZIP o Are unaware of the doctor or patient tables web: www.omii.ac.uk email: info@omii.ac.uk
  • 51. OGSA-DAI SQL views • OGSA-DAI SQL views data resource o Represents a view across a database exposed by an OGSA-DAI relational resource • SQLQuery activity o Parses query o Splices in view definition o Submits transformed query to database • Can define views for read-only databases • Schema transformation o Map a logical schema to a physical schema web: www.omii.ac.uk email: info@omii.ac.uk
  • 52. OGSA-DAI SQL views and security • Factor in client’s security credentials • e.g. define drPatient view as o SELECT patients.* FROM patients, doctor WHERE patients.DrID = doctor.ID AND d.dn = $DN$; • Replace $DN$ by client’s DN provided by Grid security components • Doctors can only view their own patients web: www.omii.ac.uk email: info@omii.ac.uk
  • 53. Distributed query processing • OGSA-DQP o Developed by Universities of Manchester and Newcastle o Refactored for OGSA-DAI 3.0 by EPCC as part of the NextGrid project o OGSA-DAI DQP package • Multiple tables on multiple databases are exposed to clients as multiple tables in one “virtual database” • Clients are unaware of the multiple databases • Databases can be exposed o EITHER within one OGSA-DAI server o OR via multiple remote OGSA-DAI servers web: www.omii.ac.uk email: info@omii.ac.uk
  • 54. OGSA-DAI DQP 3b: SELECT Annotations_Ratings.ID, 3a: SELECT Annotations_Ratings.Confidence Archeo_Finds.ID, FROM Annotation_Ratings Archeo_Finds.Provenance WHERE FROM Archeo_Finds; OGSA-DAI OGSA-DAI Annotations_Ratings.Confidence > 0.99 3: Execute 4: Push results sub-queries OGSA-DAI (DQP query evaluator) 5: Combine and post- process – do the JOIN 2: Parse query and OGSA-DAI (core + DQP coordinator) form query plan 5: Results 1: SELECT Archeo_Finds.ID, Archeo_Finds.Provenance, Annotations_Ratings.Confidence FROM Annotations_Ratings, Client HGV_June WHERE Annotations_Ratings.Confidence > 0.99 AND Annotations_Ratings.ID = Archeo_Finds.ID; web: www.omii.ac.uk email: info@omii.ac.uk
  • 55. OGSA-DAI workflows – a de-facto standard • OGSA-DAI workflows are a de-facto standard o Of use to many projects as we’ll see • For some applications workflows are too powerful o Too expressive o Infer semantics from names of activities available on server • Must interrogate the server o Problems using OGSA-DAI services in workflow engines e.g. Taverna o Not compatible with existing data analysis tools web: www.omii.ac.uk email: info@omii.ac.uk
  • 56. Facades • Define facades on top of OGSA-DAI • Why? o Provide interfaces with more tightly-defined semantics o Comply with standards o Exploit existing data analysis tools • Continue to exploit the power of workflows under-the- hood o “Canned workflows” o Templates selected and populated, executed and parsed o Map service operations to “template” OGSA-DAI workflows web: www.omii.ac.uk email: info@omii.ac.uk
  • 57. Grid-enabling existing data-related products OGSA-DAI OGSA-DAI mediator Data analysis tool web: www.omii.ac.uk email: info@omii.ac.uk
  • 58. OGSA-DAI in action web: www.omii.ac.uk email: info@omii.ac.uk
  • 59. VOTES – data with different schema distributed across multiple databases within a group of strategic partners • Virtual Organisations for Trials and Epidemiological Studies (VOTES) o http://labserv.nesc.gla.ac.uk/projects/votes/index.html o UK Medical Research Council project • Data access and integration in the clinical domain o Relational databases – Microsoft SQL Server, Access, … o Distributed database joins • Patient information • Clinical trials records o Linking key is Scotland’s CHI number web: www.omii.ac.uk email: info@omii.ac.uk
  • 60. VOTES – cross-database join activity workflow OGSA DB1 DAI DB2 SELECT CHI, Sex, DOB FROM Patients ORDER BY CHI SQLQuery (CHI, Sex, DOB) (DB1) (CHI, Sex, DOB, Diagnosis) Ordered data Merge streams Deliver Join SQLQuery (DB2) (CHI, Diagnosis) SELECT CHI, Diagnosis FROM TrialX ORDER BY CHI • This is equivalent to running: SELECT chi, sex, DOB, diagnosis FROM patients, trialX WHERE patients.chi = trialX.chi; • patients and trialX are in two different databases web: www.omii.ac.uk email: info@omii.ac.uk
  • 61. Public Health Grid – data with different schema distributed across multiple databases within a group of strategic partners • US Public Health Grid o US Centers for Disease Control o University of Pittsburgh o Tarrant Country Public Health Department o Dallas County Public Health Department • Real-time Outbreak and Disease Surveillance o Health query system o Look for incidences of some disease on the rise over an area o Historical and live data • Health centres maintain their own databases o Distributed databases o Different products and schemas • e.g. PatientID, Id, PatientIdentifier, PatientNumber o Security and privacy is important web: www.omii.ac.uk email: info@omii.ac.uk
  • 62. Public Health Grid – workflows, DQP and views DB1 workflow OGSA OGSA- OGSA- DAI DB6 View DB5 DAI DAI DB2 OGSA- OGSA- DQP DB4 View DB3 DAI SELECT zip, count(*) as total FROM Cases WHERE Reason = “Flu” Cases: GROUP BY zip SELECT * FROM ORDER BY zip DB1.Cases UNION DB2.Cases UNION (15112, 3) DB4.Cases SQLQuery (15144, 1) (DB6) web: www.omii.ac.uk email: info@omii.ac.uk
  • 63. SEE-GEO – working with private and public data • SEcurE access to GEOspatial services o http://edina.ac.uk/projects/seesaw/seegeo/index.html o EDINA, MIMAS, NeSC, NCeSS o UK JISC project • Geographical information systems • Virtual integration of and access control to o Census data – geo-data access service o Borders data – web feature service o Data hosted by other organisations and exposed as services web: www.omii.ac.uk email: info@omii.ac.uk
  • 64. SEE-GEO – geo-linking service portal 1: GLSQuery submited via portal e.g. “Leeds population GLS Maps distribution by Portal census output 5: Portal gets image using URL area” 4: URL of image is returned to portal – avoids costly SOAP/HTTP transfer of image MIMAS OGSA-DAI Census Get 3: Image Join Transform Deliver is placed Get on a map UK server BORDERS 2: Workflow is populated with query parameters and run Image Creation Service web: www.omii.ac.uk email: info@omii.ac.uk
  • 65. What did OGSA-DAI give SEE-GEO? • Could implement GLS service without OGSA- DAI • But using OGSA-DAI allowed leverage of o Workflow engine o Out-of-the-box activities for • Queries • Delivery o Security o Other grid technologies, e.g. GridFTP 65 web: www.omii.ac.uk email: info@omii.ac.uk
  • 66. What did OGSA-DAI give SEE-GEO? • A toolkit to o Develop domain-specific activities o Develop support for domain-specific data resources o Ability to execute workflows using these o Build OGC Web Processing Services (WPS) • Relatively little effort to o Choose different data resources dynamically o Merge GDAS XML into a relational data resource o Transfer data using GridFTP o Protect data using GSI o Experiment! 66 web: www.omii.ac.uk email: info@omii.ac.uk
  • 67. Why OGSA-DAI? web: www.omii.ac.uk email: info@omii.ac.uk
  • 68. Workflows • A workflow can represent a complex data management scenario, involving: o Data access o Transformation o Filtering o Updating o Numerous distributed, heterogeneous databases web: www.omii.ac.uk email: info@omii.ac.uk
  • 69. Workflows and performance • OGSA-DAI is one more layer between clients and data • Therefore, OGSA-DAI is not as fast as a direct connection to a database o OGSA-DAI uses JDBC so will never be as fast as a direct JDBC connection • But this is not what OGSA-DAI is designed to do web: www.omii.ac.uk email: info@omii.ac.uk
  • 70. Workflows and performance • Having a server execute workflows yields o Thinner clients with less memory and CPU requirements o Minimised client-server communication overheads • Activities process data on the server o Minimises data movement o As opposed to BPEL or Taverna or web service-based workflow engines which pass data to and fro via web services • Data streaming o Activities work on different parts of the data stream in parallel o Reduces memory footprint on server o Reduces execution time web: www.omii.ac.uk email: info@omii.ac.uk
  • 71. Workflows and inter-operability • A workflow is a simple way of representing a complex set of related, ordered actions o A de-facto standard o Very expressive • How to standardise and promote inter- operability? o Use a facade and exploit workflows behind well- defined interface o Facilitate inter-operability with other data products web: www.omii.ac.uk email: info@omii.ac.uk
  • 72. Why another layer can be good • Data providers retain control of their data • A place to hide database heterogeneities o Yields thinner clients • A place to enforce additional security o Hide the actual location of the data o Filter the data according to the rights of clients o Manage access to federations, databases, tables, documents, files, rows, lines • A place to define views on read-only databases web: www.omii.ac.uk email: info@omii.ac.uk
  • 73. Developing applications • OGSA-DAI is highly extensible o Data resources, activities, security, presentation layers • An enabling framework o Save development time o Focus on application-specific features o Get standard functionalities out-of-the-box • Queries, updates, transformations, deliveries web: www.omii.ac.uk email: info@omii.ac.uk
  • 74. Portability • OGSA-DAI is 100% Java o Runs under Windows, UNIX, Linux • OGSA-DAI uses web services o Clients can be written in any language and on any platform that supports web services web: www.omii.ac.uk email: info@omii.ac.uk
  • 75. Accessibility • 100% Java open source freeware • Compliant with free open source web and grid products o Globus Toolkit 4.0.x o Apache Axis/Tomcat o OMII 3.4.0 o UNICORE – by OMII-Europe o VOMS – by OMII-Europe web: www.omii.ac.uk email: info@omii.ac.uk
  • 76. Second and third hands-on sessions Go to : http://homepages.nesc.ac.uk/~elias/issgc09/html/practical.html #ScenarioTwoDataIntegration 76 web: www.omii.ac.uk email: info@omii.ac.uk
  • 77. Further information • WWW site : http://www.ogsadai.org.uk • Info : info@ogsadai.org.uk • Users e-mail list : users@ogsadai.org.uk web: www.omii.ac.uk email: info@omii.ac.uk