SlideShare a Scribd company logo
Scientific Applications of
   Data Distribution Service
    Svetlana Shasharina#, Nanbor Wang,
Rooparani Pundaleeka, James Matykiewicz and
              Steve Goldhaber     http://www.txcorp.com
                # sveta@txcorp.com
Outline

• Introduction
   – Tech-X Corporation
   – Some driving scientific applications
   – Common requirements
• QuIDS project
   • Workflow management
   • Security experiments
   • Python DDS implementation
• Summary
http://www.txcorp.com

       Tech-X Corporation
• Founded in 1994, located in Boulder CO
• 65 people (mostly computational physics, app math,
  applied computer science)
• Have been merging CS (C++, CORBA, GRID, MPI, GPU,
  complex via and data management) and physics and
  looking at DDS
• Funded by DOE, NASA, DOD and sales
• Applications in
  – Plasma modeling (accelerators, lasers, fusion devices,
    semiconductors) and beam physics
  – Nanotechnology
  – Data analysis
Large Synoptic Survey
                         Telescope (LSST)
•   On ground digital camera to build in
    Chile to start in 2020 (?). Funded
    by DOE, NASA, university, private
    sector
•   Up to 2000 images/day +
    calibration data -> 30 TB/day
•   Processed locally and reprocessed
    and archived in Illinois (National
    Center for Supercomputing
    Applications)
•   Uses OpenSplice for control
    software
•   Can we help with data
    management: orchestration of
    steps and monitoring of data
    processing?
NoVA
• NoVA: NuMI Off-axis ve (electron
neutrino) Appearance experiment
• Will generate neutrino beams at
FNAL and send it to a detector in
Ash River, Minnesota (500 mile in < 2 ms)
• DOE funded (many labs and universities)
• RMS (Responsive Messaging System) is DDS-based
  system to pass control and status messages in the NoVA
  data acquisition system (two types of topics but has many
  actual topics to implement point-to-point communications)
• Will eventually need to go over WAN, and provide 10 Hz
  status transmissions between ~100 applications
• Simplifies OpenSplice using traits (like simd-cxx) to
  minimize the amount of data types and mapping topics to
  strings
SciDAC-II LQCD
• LQCD: Lattice Quantum Chromodynamics
  (computational version of QCD: a theory of strong
  interaction involving quarks and gluons making up
  hadrons like protons and neutrons)
• DDS is used to perform monitoring of clusters doing
  LQCD calculations (detect job failures, evaluate nodes
  loads and performance, start/kill etc)
• Topics for monitoring and controls of jobs and
  resources
• Use OpenSplice
Common themes for scientific
               apps and DDS
• RT issues are not well estimated
• Common usability needs
   – Support for scientific data formats and data products (from and
     out of topics): domain schemas and data transformation tools
   – Control and monitoring topics (can we come up with reusable
     schema?)
   – Simple APIs corresponding to expectations of scientists
   – Ease of modification (evolving systems not just production
     systems)
   – QoS cookbook (how to get correct combinations)
   – General education
      • Is DDS good for point-to-point (Bill talks only to Pete)
      • How one uses DDS without killing the system (memory etc)
• Other requirements
   – Site and community specific security
   – WAN operation (Chicago and Berkeley, for example)
Common extra expectations
• Automated test harness:
   – How one tests for correct behavior and QoS
   – Avoid regression in rapidly evolving system modified by a
      team
• Interacting with databases (all data should be archived and
  allow queries)
• Can we do everything using DDS to minimize external
  dependencies?
   – Efficient bulk data transfer (usually point-to-point and BIG
      triangles :-)
   – Workflow engine (workflow: loosely coupled applications
      often through files and can be distributed, while
      simulations is typically tightly coupled on a HPC resource)
• Interacting with Web interfaces and Web Services
QuIDS: to address some issues
• QuIDS: Quality Information Distribution System
• Helping the applications above through Phase II SBIR from
  DOE (HEP office)
• Collaboration of Tech-X and Fermilab
• Goals (we will talk about the ones in red in rest of this talk):
   – Implement a DDS-based system to monitor distributed
     processing of astronomical data
      • Simplifying C++(done with simd-cxx?) and Python APIs
      • Support for FITS and monitoring and control data, images,
        histograms, spectra
      • Security
      • WAN
      • Testing harness
   – Investigate of of DDS for workflow management
QuIDS, bird eye view (courtesy of
             FNAL)
QuIDS at FNAL computational
                      domain
 MCTopic        SciTopic




                                       Monitor
                             MCTopic
W W W      R R                          R      R      W
CampaignManager

                                             MCTopic
                           MCTopic
                                                          Workflow
                                     Application(s)       of apps
             SciTopic
                                      R     W W
Computa-onal	
  Domain
Generic workflows: do we need all?
• Workflow is something outside of HPC (loosely coupled and
  can tolerate even WAN, while simulation is something that
  goes to qsub…)
• Kepler (de-facto workflow engine expected for DOE
  applications):
   –   Support for complex workflows
   –   Java based
   –   Heavy and hard to learn
   –   Not portable to future platforms (DOE supercomputers might not have
       Java at all)
• Real workflows in astronomy are simple (do not
  expressivness of full programming language or pi-algebra)
   – Pipelines
   – DAGs
• How one implements such workflows using DDS?
Parallel pipeline: most of
                 astronomy workflows

Worker(0)       Task(0)         Task(1)         Task(N-1)

Initialize        Task(0)     Task(1)         Task(N-1)     Finalize


Worker(2)       Task(0)          Task(1)        Task(N-1)

      Tasks can be continued by different
      working processes: data can be passed
      between them (the Worker(1)
      performs Task(1) using data from
      Worker (0))
ddspipe: current implementation of
                    workflow engine
•   Parallel pipeline job consist of
     – Initialization phase (splitting data into manageable pieces) running on
        one node
     – Parallel worker processes doing data processing tasks (possible not
        sharing the same address space)
     – Finalization step (merging data into a final image or movie)
•   There is an expected order in tasks, so that tasks can be numbered and
    output data of a previous step as input to next
•   Design decisions for now:
     – Workers do not communicate to each other
     – Workers are given work by a mediating entity: tuple space manager
        (no self-organization)
     – No scheduling for now (round-robin: tasks and workers are queued in
        the server)
     – Workers can get data coming from a task completed by a different
        worker (do not address the problem of data transfer now)
•   All communication is via DDS topics while data to work on is available
    through local files to all workers
GDS = Tuple Space but we
                    want more (?)
•   Tickets:
     – Task ticket (id, indata, out data, status)
     – Task status: standby, ready, running, finished
     – Worker ticket (id, task ticket, status)
     – Worker status: ready, busy
•   Classic tuple space = set of task ticket and we could use only them
    but… instead of dealing with a self-organized (wild) system, we
    would like to implement
     – Workflow: M sequences of tasks with matching in and out data
     – Scheduling (based on policies, resources, location of workers)
     – Fault-tolerance (detecting and rescheduling of incomplete tasks)
•   Hence: we decided to have a class TupleSpaceServer to address
    these (currently just pipeline and queues and no FT)
ddspipe classes:
• Initializer
   – Splits data, possibly creates workers and workspaces, publishes (for
     all initial work tickets with correct specification of the workflow
• TupleSpaceServer
   – Changes status in task tickets in accordance with the workflow order:
     once a worker reports that task n done, a ticket for task n-1 with <n-1
     in-data> = <n out-data> is changed to ready and worker topic is
     published (with the worker id next in the queue). Once a worker
     reports that is doing this work, the status is changed to running etc.
• Workers
   – Publish their status
   – Listen to task assignment (matching its id to the one in the worker
     ticket)
• Finalizer
   – Whatever to finish up (merge data and clean)
States of Tasks in Tuple Space
                              Initial              jobStatus,Run                                              Eventual
                              ticket                                                                             ticket
                              states                                                                            states
 Taks executed sequentially



                                                                           Tuplespace
                                         Task0                     Task0    internal    Task0      Task0
                                         Standby                   Ready                Running   Completed
                                                                           scheduling



                                        taskStatus,Task0,Completed




                                                                           Tuplespace
                                         Task1                     Task1    internal    Task1      Task1
                                         Standby                   Ready                Running   Completed
                                                                           scheduling




                                        taskStatus,Taskn-1,Completed




                                                                           Tuplespace
                                         Taskn                     Taskn    internal    Taskn      Taskn
                                         Standby                   Ready                Running   Completed
                                                                           scheduling




Sequences proceed independently in parallel
Tuple Space Manager Maintains
              Tasks Tickets and Schedules Tasks

                                  Tuple Space
                   ticket          Manager                     jobStatus
   Job                                                                       Job
                                Job    task   seq   status

Initializer
                 jobStatus      ID     ID     ID                           Finalizer


                workerStatus                                                    jobStatus
                workTicket
  Worker
                   ticket             Idle Workers
                                                                            Compl      Dispos
                                                                 Run
                                                                             eted      able




       Worker          Worker          Worker                Worker        Worker
Status and next steps of
         ddspipe (beyond what bash can
                     do :-)
• Prototype working
   – Although we do have some issues with memory and bugs
   – I would like to experiment with no queuing: next task open for
     grabs if one of the tasks of the previous stage is finished
• Next steps
   – User definition of workflow
   – Multiple jobs
   – Separation of worker manager from task manager?
   – Implementing workers doing slit-spectroscopy based on
     running R
   – DAG support
   – Some scheduling (balance between data transfer and mixing
     data between slow and fast workers?)
   – Data transfer implementation and in-memory data exchange
Security for scientific projects:
              from nothing to everything
• OpenSplice enterprise edition provides Secure Networking
  Service:
   – Security parameters (i.e., data encryption and authentication) are
     d fined in Security Profiles in the OpenSplice configuration file.
   – Node authentication and access control are also specified in the
     configuration profile.
   – Security Profiles are attached to partitions which set the security
     parameters in use for that partition.
   – Secure networking is activated as a domain Service
• Scientists are not used to pay 
• Used to:
   – Authenticate and authorize on connection
   – Rules are in admin area of a virtual organization (DDS
     should consult)
Providing security in community
                edition of OpenSplice
• Tried to replace the the lower networking layer of
  OpenSplice with OpenSSL and see how one can provide
  authentication, authorization and encryption
• OpenSSL is an open source toolkit:
   – Secure Sockets Layer and Transport Layer Security (new
     standard, replacing SSL)
   – General purpose cryptography library
   – Public Key Infrastructure (PKI): e.g., certificate creation, signing
     and checking, CA management
Switching to OpenSSL++ in the
           networking layer of community
                    OpenSplice
         Applications                    Applications

 Data Centric Publish/Subscibe   Data Centric Publish/Subscibe

  Real Time Publish/Subscribe    Real Time Publish/Subscribe

    RT Net              DDSi        RT Net          DDSi

             UDP / IP                     OpenSSL

• The UDP/IP layer handles the interface to the operating-
  system’s socket interface
• Switching to OpenSSL allowed us to establish secure
  tunnel between two sites
• But the configuration should be done per each two nodes!
Future Directions in Security
                  Work

• Waiting for new development and will be happy to use to
  implement what is expected by DOE labs (our
  collaborators) security
• Explore user data etc fields to address applications
  specifics if this is not addressed by the security profile
PyDDS: Python bindings for
           DDS communications
• Started with SWIG for wrapping generated bindings:
  works fine but needed manual wrapping of multiple
  generated classes
• Next worked with Boost.Python for wrapping of
  communication layer (set of classes that are used in
  the IDLPP generated code to call into OpenSplice for
  communication) so that there will be no need to wrap
  generated bindings
• Problem: need to take care of inheritance manually
  and deal with several handlers that are unexposed C
  structs used in forward declarations
Status and next steps for
                         PyDDS
• Hand-wrapping of C++ bindings using SWIG works:
#!/usr/bin/python
import time
import TxddsPy
qos = TxddsPy.Qos()
qos.setTopicReliable()
qos.setWriterReliable()
writer = TxddsPy.RawImageWriter(”rawImage")
writer.setTopicQos(qos)
writer.setWriterQos(qos)
data = TxddsPy.rawImage()
writer.writeData(data)

• Next:
    – Investigate wrapping communication classes so that we
      expose minimum for Boost.Python
    – Or develop tools for generating glue code needed to
      Boost.Python using a string list of topics
QuIDS summary and future
                      directions
• We have prototyped DDS-bases solutions for astronomy data
  processing applications
   –   Tools for bringing FITS data into DDS
   –   Simple QoS scenarios (getting prepublished data)
   –   Parallel pipeline workflows
   –   Security studies
   –   Python APIs
• Possible next steps
   –   Language to describe workflows for user input into the system
   –   More complex workflows and concrete implementations
   –   Implementing FNAL security requirements
   –   WAN communication between FNAL and LBNL going through firewall
   –   Streamlining glue code generation for Python
   –   Testing harness
   –   Bulk data transfers
   –   Archiving data into databases
   –   Web interfaces
Acknowledgements

• Ground Data System (GDS) team from Fermi National
  Accelerator Laboratory
• PrismTech
• Nikolay Malitsky (BNL)
• OpenSplice mailing list and its contributors
• US Department of Energy, Office of High Energy
  Physics

More Related Content

PDF
A science-gateway for workflow executions: online and non-clairvoyant self-h...
PDF
Task Resource Consumption Prediction for Scientific Applications and Workflows
PPTX
Hanborq Optimizations on Hadoop MapReduce
PPTX
GoodFit: Multi-Resource Packing of Tasks with Dependencies
PDF
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
PDF
MapReduce and Hadoop
PPTX
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Task Resource Consumption Prediction for Scientific Applications and Workflows
Hanborq Optimizations on Hadoop MapReduce
GoodFit: Multi-Resource Packing of Tasks with Dependencies
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
MapReduce and Hadoop
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov

What's hot (20)

PPTX
KIISE:SIGDB Workshop presentation.
PDF
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
PDF
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
PDF
Parallel Data Processing with MapReduce: A Survey
PPSX
MapReduce Scheduling Algorithms
PDF
Database Research on Modern Computing Architecture
PDF
Scalable Algorithm Design with MapReduce
PDF
Tokyo Webmining Talk1
PPTX
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
PDF
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
PPT
Session 46 - Principles of workflow management and execution
PDF
Hadoop interview questions - Softwarequery.com
PDF
Beyond Map/Reduce: Getting Creative With Parallel Processing
PDF
MapReduce: Distributed Computing for Machine Learning
PPTX
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
PDF
MapReduce basics
PPTX
Zaharia spark-scala-days-2012
PDF
benchmarks-sigmod09
PDF
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
PDF
Google Spanner : our understanding of concepts and implications
KIISE:SIGDB Workshop presentation.
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Parallel Data Processing with MapReduce: A Survey
MapReduce Scheduling Algorithms
Database Research on Modern Computing Architecture
Scalable Algorithm Design with MapReduce
Tokyo Webmining Talk1
(Slides) Task scheduling algorithm for multicore processor system for minimiz...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Session 46 - Principles of workflow management and execution
Hadoop interview questions - Softwarequery.com
Beyond Map/Reduce: Getting Creative With Parallel Processing
MapReduce: Distributed Computing for Machine Learning
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
MapReduce basics
Zaharia spark-scala-days-2012
benchmarks-sigmod09
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
Google Spanner : our understanding of concepts and implications
Ad

Viewers also liked (20)

PDF
MV open innovation cluster conference handout
PPT
Good thoughts
PDF
Peqoud
PDF
Pleno municipal infantil 2012
PDF
ikp321-02
PDF
Pleno municipal infantil 2013
PPT
Cyberpolitics 2009 W11
PDF
Borderland.Reading Is Thinking.Sept2015
PDF
Premios dia del libro 2016
PDF
OpenSplice DDS Goes Open Source
PPTX
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
PPT
Inside a Computer
PDF
PCI Compliance: What You Need to Know
PPT
Sph 106 Ch 2
PDF
ikp321-04
PPT
Week 6 cyberpolitics
PPTX
Excellent Roth IRA Alternative
PPT
Sph 106 Ch 15
PPT
Egipto
PDF
Web Application Security For Small and Medium Businesses
MV open innovation cluster conference handout
Good thoughts
Peqoud
Pleno municipal infantil 2012
ikp321-02
Pleno municipal infantil 2013
Cyberpolitics 2009 W11
Borderland.Reading Is Thinking.Sept2015
Premios dia del libro 2016
OpenSplice DDS Goes Open Source
Search Party - Internet & Social Media Search Tricks that Will Improve the Wa...
Inside a Computer
PCI Compliance: What You Need to Know
Sph 106 Ch 2
ikp321-04
Week 6 cyberpolitics
Excellent Roth IRA Alternative
Sph 106 Ch 15
Egipto
Web Application Security For Small and Medium Businesses
Ad

Similar to Scientific Applications of The Data Distribution Service (20)

PPTX
Hadoop fault tolerance
PDF
Hadoop map reduce concepts
PDF
Hadoop first mr job - inverted index construction
PDF
Overview of Scientific Workflows - Why Use Them?
PPTX
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
PDF
Atomate: a high-level interface to generate, execute, and analyze computation...
PPTX
Juniper Innovation Contest
PPTX
Hanborq optimizations on hadoop map reduce 20120221a
PDF
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
PDF
Common Design Elements for Data Movement Eli Dart
PDF
Extending Hadoop for Fun & Profit
PPTX
Spark Overview and Performance Issues
PPTX
PEARC 17: Spark On the ARC
PPT
lecture1.ppt
PPT
Hadoop tutorial
PPT
Seminar Presentation Hadoop
PPT
Hadoop Tutorial.ppt
PPTX
Hadoop training-in-hyderabad
PDF
Apache Tez : Accelerating Hadoop Query Processing
Hadoop fault tolerance
Hadoop map reduce concepts
Hadoop first mr job - inverted index construction
Overview of Scientific Workflows - Why Use Them?
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
Atomate: a high-level interface to generate, execute, and analyze computation...
Juniper Innovation Contest
Hanborq optimizations on hadoop map reduce 20120221a
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...
Common Design Elements for Data Movement Eli Dart
Extending Hadoop for Fun & Profit
Spark Overview and Performance Issues
PEARC 17: Spark On the ARC
lecture1.ppt
Hadoop tutorial
Seminar Presentation Hadoop
Hadoop Tutorial.ppt
Hadoop training-in-hyderabad
Apache Tez : Accelerating Hadoop Query Processing

More from Angelo Corsaro (20)

PDF
Zenoh: The Genesis
PDF
zenoh: The Edge Data Fabric
PDF
Zenoh Tutorial
PDF
Data Decentralisation: Efficiency, Privacy and Fair Monetisation
PDF
zenoh: zero overhead pub/sub store/query compute
PDF
zenoh -- the ZEro Network OverHead protocol
PDF
zenoh -- the ZEro Network OverHead protocol
PDF
Breaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
PDF
Eastern Sicily
PDF
fog05: The Fog Computing Infrastructure
PDF
Cyclone DDS: Sharing Data in the IoT Age
PDF
fog05: The Fog Computing Platform
PDF
Programming in Scala - Lecture Four
PDF
Programming in Scala - Lecture Three
PDF
Programming in Scala - Lecture Two
PDF
Programming in Scala - Lecture One
PDF
Data Sharing in Extremely Resource Constrained Envionrments
PDF
The DDS Security Standard
PDF
The Data Distribution Service
PDF
RUSTing -- Partially Ordered Rust Programming Ruminations
Zenoh: The Genesis
zenoh: The Edge Data Fabric
Zenoh Tutorial
Data Decentralisation: Efficiency, Privacy and Fair Monetisation
zenoh: zero overhead pub/sub store/query compute
zenoh -- the ZEro Network OverHead protocol
zenoh -- the ZEro Network OverHead protocol
Breaking the Edge -- A Journey Through Cloud, Edge and Fog Computing
Eastern Sicily
fog05: The Fog Computing Infrastructure
Cyclone DDS: Sharing Data in the IoT Age
fog05: The Fog Computing Platform
Programming in Scala - Lecture Four
Programming in Scala - Lecture Three
Programming in Scala - Lecture Two
Programming in Scala - Lecture One
Data Sharing in Extremely Resource Constrained Envionrments
The DDS Security Standard
The Data Distribution Service
RUSTing -- Partially Ordered Rust Programming Ruminations

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Digital-Transformation-Roadmap-for-Companies.pptx
Programs and apps: productivity, graphics, security and other tools
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Building Integrated photovoltaic BIPV_UPV.pdf
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Scientific Applications of The Data Distribution Service

  • 1. Scientific Applications of Data Distribution Service Svetlana Shasharina#, Nanbor Wang, Rooparani Pundaleeka, James Matykiewicz and Steve Goldhaber http://www.txcorp.com # sveta@txcorp.com
  • 2. Outline • Introduction – Tech-X Corporation – Some driving scientific applications – Common requirements • QuIDS project • Workflow management • Security experiments • Python DDS implementation • Summary
  • 3. http://www.txcorp.com Tech-X Corporation • Founded in 1994, located in Boulder CO • 65 people (mostly computational physics, app math, applied computer science) • Have been merging CS (C++, CORBA, GRID, MPI, GPU, complex via and data management) and physics and looking at DDS • Funded by DOE, NASA, DOD and sales • Applications in – Plasma modeling (accelerators, lasers, fusion devices, semiconductors) and beam physics – Nanotechnology – Data analysis
  • 4. Large Synoptic Survey Telescope (LSST) • On ground digital camera to build in Chile to start in 2020 (?). Funded by DOE, NASA, university, private sector • Up to 2000 images/day + calibration data -> 30 TB/day • Processed locally and reprocessed and archived in Illinois (National Center for Supercomputing Applications) • Uses OpenSplice for control software • Can we help with data management: orchestration of steps and monitoring of data processing?
  • 5. NoVA • NoVA: NuMI Off-axis ve (electron neutrino) Appearance experiment • Will generate neutrino beams at FNAL and send it to a detector in Ash River, Minnesota (500 mile in < 2 ms) • DOE funded (many labs and universities) • RMS (Responsive Messaging System) is DDS-based system to pass control and status messages in the NoVA data acquisition system (two types of topics but has many actual topics to implement point-to-point communications) • Will eventually need to go over WAN, and provide 10 Hz status transmissions between ~100 applications • Simplifies OpenSplice using traits (like simd-cxx) to minimize the amount of data types and mapping topics to strings
  • 6. SciDAC-II LQCD • LQCD: Lattice Quantum Chromodynamics (computational version of QCD: a theory of strong interaction involving quarks and gluons making up hadrons like protons and neutrons) • DDS is used to perform monitoring of clusters doing LQCD calculations (detect job failures, evaluate nodes loads and performance, start/kill etc) • Topics for monitoring and controls of jobs and resources • Use OpenSplice
  • 7. Common themes for scientific apps and DDS • RT issues are not well estimated • Common usability needs – Support for scientific data formats and data products (from and out of topics): domain schemas and data transformation tools – Control and monitoring topics (can we come up with reusable schema?) – Simple APIs corresponding to expectations of scientists – Ease of modification (evolving systems not just production systems) – QoS cookbook (how to get correct combinations) – General education • Is DDS good for point-to-point (Bill talks only to Pete) • How one uses DDS without killing the system (memory etc) • Other requirements – Site and community specific security – WAN operation (Chicago and Berkeley, for example)
  • 8. Common extra expectations • Automated test harness: – How one tests for correct behavior and QoS – Avoid regression in rapidly evolving system modified by a team • Interacting with databases (all data should be archived and allow queries) • Can we do everything using DDS to minimize external dependencies? – Efficient bulk data transfer (usually point-to-point and BIG triangles :-) – Workflow engine (workflow: loosely coupled applications often through files and can be distributed, while simulations is typically tightly coupled on a HPC resource) • Interacting with Web interfaces and Web Services
  • 9. QuIDS: to address some issues • QuIDS: Quality Information Distribution System • Helping the applications above through Phase II SBIR from DOE (HEP office) • Collaboration of Tech-X and Fermilab • Goals (we will talk about the ones in red in rest of this talk): – Implement a DDS-based system to monitor distributed processing of astronomical data • Simplifying C++(done with simd-cxx?) and Python APIs • Support for FITS and monitoring and control data, images, histograms, spectra • Security • WAN • Testing harness – Investigate of of DDS for workflow management
  • 10. QuIDS, bird eye view (courtesy of FNAL)
  • 11. QuIDS at FNAL computational domain MCTopic SciTopic Monitor MCTopic W W W R R R R W CampaignManager MCTopic MCTopic Workflow Application(s) of apps SciTopic R W W Computa-onal  Domain
  • 12. Generic workflows: do we need all? • Workflow is something outside of HPC (loosely coupled and can tolerate even WAN, while simulation is something that goes to qsub…) • Kepler (de-facto workflow engine expected for DOE applications): – Support for complex workflows – Java based – Heavy and hard to learn – Not portable to future platforms (DOE supercomputers might not have Java at all) • Real workflows in astronomy are simple (do not expressivness of full programming language or pi-algebra) – Pipelines – DAGs • How one implements such workflows using DDS?
  • 13. Parallel pipeline: most of astronomy workflows Worker(0) Task(0) Task(1) Task(N-1) Initialize Task(0) Task(1) Task(N-1) Finalize Worker(2) Task(0) Task(1) Task(N-1) Tasks can be continued by different working processes: data can be passed between them (the Worker(1) performs Task(1) using data from Worker (0))
  • 14. ddspipe: current implementation of workflow engine • Parallel pipeline job consist of – Initialization phase (splitting data into manageable pieces) running on one node – Parallel worker processes doing data processing tasks (possible not sharing the same address space) – Finalization step (merging data into a final image or movie) • There is an expected order in tasks, so that tasks can be numbered and output data of a previous step as input to next • Design decisions for now: – Workers do not communicate to each other – Workers are given work by a mediating entity: tuple space manager (no self-organization) – No scheduling for now (round-robin: tasks and workers are queued in the server) – Workers can get data coming from a task completed by a different worker (do not address the problem of data transfer now) • All communication is via DDS topics while data to work on is available through local files to all workers
  • 15. GDS = Tuple Space but we want more (?) • Tickets: – Task ticket (id, indata, out data, status) – Task status: standby, ready, running, finished – Worker ticket (id, task ticket, status) – Worker status: ready, busy • Classic tuple space = set of task ticket and we could use only them but… instead of dealing with a self-organized (wild) system, we would like to implement – Workflow: M sequences of tasks with matching in and out data – Scheduling (based on policies, resources, location of workers) – Fault-tolerance (detecting and rescheduling of incomplete tasks) • Hence: we decided to have a class TupleSpaceServer to address these (currently just pipeline and queues and no FT)
  • 16. ddspipe classes: • Initializer – Splits data, possibly creates workers and workspaces, publishes (for all initial work tickets with correct specification of the workflow • TupleSpaceServer – Changes status in task tickets in accordance with the workflow order: once a worker reports that task n done, a ticket for task n-1 with <n-1 in-data> = <n out-data> is changed to ready and worker topic is published (with the worker id next in the queue). Once a worker reports that is doing this work, the status is changed to running etc. • Workers – Publish their status – Listen to task assignment (matching its id to the one in the worker ticket) • Finalizer – Whatever to finish up (merge data and clean)
  • 17. States of Tasks in Tuple Space Initial jobStatus,Run Eventual ticket ticket states states Taks executed sequentially Tuplespace Task0 Task0 internal Task0 Task0 Standby Ready Running Completed scheduling taskStatus,Task0,Completed Tuplespace Task1 Task1 internal Task1 Task1 Standby Ready Running Completed scheduling taskStatus,Taskn-1,Completed Tuplespace Taskn Taskn internal Taskn Taskn Standby Ready Running Completed scheduling Sequences proceed independently in parallel
  • 18. Tuple Space Manager Maintains Tasks Tickets and Schedules Tasks Tuple Space ticket Manager jobStatus Job Job Job task seq status Initializer jobStatus ID ID ID Finalizer workerStatus jobStatus workTicket Worker ticket Idle Workers Compl Dispos Run eted able Worker Worker Worker Worker Worker
  • 19. Status and next steps of ddspipe (beyond what bash can do :-) • Prototype working – Although we do have some issues with memory and bugs – I would like to experiment with no queuing: next task open for grabs if one of the tasks of the previous stage is finished • Next steps – User definition of workflow – Multiple jobs – Separation of worker manager from task manager? – Implementing workers doing slit-spectroscopy based on running R – DAG support – Some scheduling (balance between data transfer and mixing data between slow and fast workers?) – Data transfer implementation and in-memory data exchange
  • 20. Security for scientific projects: from nothing to everything • OpenSplice enterprise edition provides Secure Networking Service: – Security parameters (i.e., data encryption and authentication) are d fined in Security Profiles in the OpenSplice configuration file. – Node authentication and access control are also specified in the configuration profile. – Security Profiles are attached to partitions which set the security parameters in use for that partition. – Secure networking is activated as a domain Service • Scientists are not used to pay  • Used to: – Authenticate and authorize on connection – Rules are in admin area of a virtual organization (DDS should consult)
  • 21. Providing security in community edition of OpenSplice • Tried to replace the the lower networking layer of OpenSplice with OpenSSL and see how one can provide authentication, authorization and encryption • OpenSSL is an open source toolkit: – Secure Sockets Layer and Transport Layer Security (new standard, replacing SSL) – General purpose cryptography library – Public Key Infrastructure (PKI): e.g., certificate creation, signing and checking, CA management
  • 22. Switching to OpenSSL++ in the networking layer of community OpenSplice Applications Applications Data Centric Publish/Subscibe Data Centric Publish/Subscibe Real Time Publish/Subscribe Real Time Publish/Subscribe RT Net DDSi RT Net DDSi UDP / IP OpenSSL • The UDP/IP layer handles the interface to the operating- system’s socket interface • Switching to OpenSSL allowed us to establish secure tunnel between two sites • But the configuration should be done per each two nodes!
  • 23. Future Directions in Security Work • Waiting for new development and will be happy to use to implement what is expected by DOE labs (our collaborators) security • Explore user data etc fields to address applications specifics if this is not addressed by the security profile
  • 24. PyDDS: Python bindings for DDS communications • Started with SWIG for wrapping generated bindings: works fine but needed manual wrapping of multiple generated classes • Next worked with Boost.Python for wrapping of communication layer (set of classes that are used in the IDLPP generated code to call into OpenSplice for communication) so that there will be no need to wrap generated bindings • Problem: need to take care of inheritance manually and deal with several handlers that are unexposed C structs used in forward declarations
  • 25. Status and next steps for PyDDS • Hand-wrapping of C++ bindings using SWIG works: #!/usr/bin/python import time import TxddsPy qos = TxddsPy.Qos() qos.setTopicReliable() qos.setWriterReliable() writer = TxddsPy.RawImageWriter(”rawImage") writer.setTopicQos(qos) writer.setWriterQos(qos) data = TxddsPy.rawImage() writer.writeData(data) • Next: – Investigate wrapping communication classes so that we expose minimum for Boost.Python – Or develop tools for generating glue code needed to Boost.Python using a string list of topics
  • 26. QuIDS summary and future directions • We have prototyped DDS-bases solutions for astronomy data processing applications – Tools for bringing FITS data into DDS – Simple QoS scenarios (getting prepublished data) – Parallel pipeline workflows – Security studies – Python APIs • Possible next steps – Language to describe workflows for user input into the system – More complex workflows and concrete implementations – Implementing FNAL security requirements – WAN communication between FNAL and LBNL going through firewall – Streamlining glue code generation for Python – Testing harness – Bulk data transfers – Archiving data into databases – Web interfaces
  • 27. Acknowledgements • Ground Data System (GDS) team from Fermi National Accelerator Laboratory • PrismTech • Nikolay Malitsky (BNL) • OpenSplice mailing list and its contributors • US Department of Energy, Office of High Energy Physics