SlideShare a Scribd company logo
Efficient Production of Synthetic
Skies for the Dark Energy Survey




             Raminder Singh
             Science Gateways Group
         Indiana University, Bloomington.
                  ramifnu@iu.edu
Background & Explanation

•    First project to combine four different statistical methods
     of dark matter and dark energy.
    • Baryon acoustic oscillations in the matter power
       spectrum.
    • Abundance and spatial distribution of galaxy groups and
       clusters.
    • Weak gravitational lensing by large-scale structure.
    • Type Ia supernovae - are quasi-independent.

•   5000-square degree survey of cosmic structure traced by
    galaxies
•   Simulation Working Group is generating simulations for
    galaxy yields in various cosmologies
•   Analysis of these simulated catalogs offers a quality
    assurance capability for cosmological and astrophysical
    analysis of upcoming DES telescope data.
Blind Cosmology Challenge (BCC)


• N-Body simulation to simulate
  the dark matter
• Post-processed using an
  empirical approach to link
  galaxy properties to dark
  matter structures (halos)
• Gravitational lensing shear
  (new Spherical Harmonic
  Tree code)
• ADDGALS methodology,
  empirical tuning.               Figure: Processing steps to build a synthetic galaxy
                                  catalog. Xbaya workflow currently controls the top-
                                  most element (N-body simulations) which consists of
                                  methods to sample a cosmological power spectrum
                                  (ps), generating an initial set of particles (ic) and
                                  evolving the particles forward in time with Gadget (N-
                                  body). The remaining methods are run manually on
                                  distributed resources.
Project Plan

• Project requested for ECSS support to automate and
  optimize processing on XSEDE resources.
• ECSS staff, Dora Cai worked with the group on the
  work plan.
• Work plan was discussed and a timeline were given to
  each task
• Work plan called for porting all codes to TACC-ranger
Main Goals

• Automated (re)submission of jobs to improve manual
   inefficiencies
   Walltime limits of TACC Ranger long queue is 48
      hrs so jobs need restart.
   Multiple simulations can run in parallel
• Automatic data archival
   Data need to be archived on Ranch
   Data need to moved to SLAC for post processing
• Provenance
• User portal
Development Roadmap
 XSEDE Quarterly                  Developments                       Lesson Learned


July - Sept 2011   • Deployed/Tested the N-Body simulation • Added module loading
                     code on Ranger.                         support on Ranger using
                   • Developed/Tested individual codes using Gram. RSL parameters
                     Apache Airavata.                        mod, mod_del
                   • Developed BCC Parameter Maker for
                     workflow configuration

Sept - Dec 2011    • Enhanced Parameter maker script to       • Queue walltime 48 hr
                     accept inputs from Airavata Workflow.      limits of Ranger
                   • Full workflow run for Single N-Body
                     smaller simulation
Jan - Mar 2012     • Medium scale testing                     • Globus online client
                   • Restart capability using Do-While          library is required to
                     contracts                                  initiate transfers from
                   • Running multiple simulations in parallel   workflow
                     for production runs.
                   • Evaluated file movement using GridFTP,
                     bbcp, Globus Online.
Development Roadmap
 XSEDE Quarterly                  Developments                       Learn Learned

April - June2012   • Ran few more production simulations 2 • Luster issue with to
                     in parallel                                many files created in a
                   • Changed I/O routines to fix issues.        single folder
                   • Able to produce full simulation data for • Gram canceling jobs
                     post processing                            after few hours in case
                                                                of connection timeout
                                                                with client
July – Sept 2012   • Investigate the job canceling issue and   • Globus team confirmed
                     reported to Globus team                     that issue is not from
                   • Changed COG API to add debug                server side but client
                     statements to debug issue                 • didn't give proper error
                   • Ran more production boxes                   to debug
                   • Migration to SDSC trestles                • MPI library loading and
                                                                 RSL parameter resolved
                                                                 using Gram on Trestles
Oct – Dec 2012     • Able to run 2 full 4 box simulations     • Restart files were not
                     using workflow.                            written properly
                   • Able to resolve Job cancel issue for the   because of some new
                     latest run.                                exception
Xbaya Workflow
• Three large-volume realizations with 20483 particles using 1024
  processors
• One smaller volume of 14003 particles using 512 processors.
• These 4 boxes need about 300,000 SUs
• Each cosmology simulation entails roughly 56TB of data.
Workflow Description

• BCC Parameter Maker
This initial setup code is written as a python script and prepares necessary configurations
   and parameter files for the workflow execution.
• CAMB
The CAMB (Code for Anisotropies in the Microwave Background) application computes the
   power spectrum of dark matter, which is necessary for generating the simulation initial
   conditions.
• 2LPTic
The Second-order Lagrangian Perturbation Theory initial conditions code (2LPTic) is
   programmed using Message Passing (MPI) C code that computes the initial conditions for
   the simulation from parameters and an input power spectrum generated by CAMB.
• LGadget
The LGadget simulation code is MPI based C code that uses a TreePM algorithm to evolve a
   gravitational N-body system. The outputs of this step are system state snapshot files, as
   well as lightcone files, and some properties of the matter distribution, including
   diagnostics such as total system energies and momenta. The total output from LGadget
   depends on resolution and the number of system snapshots stored, and approaches
   close to 10 TeraBytes for large DES simulation volumes.
Production results

• We have run four full size boxes with Airavata. The
  following table compares how time improvement is
  used by Airavata compared to manually submitting jobs.




      * ignores 1.5 day delay due to network connection error with
               job listener and subsequent gram5 job deletion
Lessons learned

• MPI libraries and job scripts can be different on
  different resources. User needs to experiment to
  learn.
• User scratch policies can be different on different
  machines so it can talk some time to migrate
  between resources.
• Migration of working codes from one machine to
  another can take weeks to months
• Grid Services and client libraries need 1st class
  support
     XSEDE ticketing system is your Best Friend!
Future Goals

• Migrate to SDSC Trestles for next production run.
• Group is planning to work with Apache Airavata for future
   extensions.
  • Implement intermediate start of workflow in case of failure
      based on Provenance information.
  • Post processing
• Plan for TACC Stampede migration.
• Currently we are using Globus Online GUI interface for file
   transfer but would like to integrate using API’s with the
   Workflow
• Migrate post processing and quality assurance code on XSEDE
   and develop post processing workflow
• Try to integrate Post processing steps on SLAC
Team & Publication
DES Simulation Working group
      August E. Evrard, Departments of Physics and Astronomy(Michigan)
      Brandon Erickson, grad student (Michigan)
      Risa Wechsler, asst. professor (Stanford/SLAC)
      Michael Busha, postdoc (Zurich)
      Matt Becker, grad student (Chicago)
       Andrey V. Kravtsov, Department of Astronomy and Astrophysics
(Chicago)
ECSS
      Suresh Marru, Science Gateways Group(Indiana)
      Marlon Pierce Science Gateways Group(Indiana)
      Lars Koesterke, TACC (Texas)
      Dora Cai, NCSA (Illinois)
Publication: XSDEDE 12:
 Brandon M. S. Erickson, Raminderjeet Singh, August E. Evrard, Matthew R. Becker, Michael T. Busha, Andrey V.
Kravtsov, Suresh Marru, Marlon Pierce, and Risa H. Wechsler. 2012. A high throughput workflow environment for
cosmological simulations. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery
Environment: Bridging from the eXtreme to the campus and beyond (XSEDE '12). ACM, New York, NY, USA, ,
Article 34 , 8 pages. DOI=10.1145/2335755.2335830 http://doi.acm.org/10.1145/2335755.2335830

More Related Content

PDF
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
PDF
Transactional writes to cloud storage with Eric Liang
PPTX
Bloomreach - BloomStore Compute Cloud Infrastructure
PDF
Low Latency Execution For Apache Spark
PDF
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
PPTX
A Comparative Performance Evaluation of Apache Flink
PPTX
Strata + Hadoop World 2012: Knitting Boar
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Transactional writes to cloud storage with Eric Liang
Bloomreach - BloomStore Compute Cloud Infrastructure
Low Latency Execution For Apache Spark
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
A Comparative Performance Evaluation of Apache Flink
Strata + Hadoop World 2012: Knitting Boar

What's hot (20)

PDF
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
PDF
Apache Spark Performance is too hard. Let's make it easier
PPTX
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
PDF
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
PPTX
Solve it Differently with Reactive Programming
PPTX
Bringing complex event processing to Spark streaming
PDF
Re-Architecting Spark For Performance Understandability
PDF
Hambug R Meetup - Intro to H2O
PDF
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
PDF
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
PDF
Spark Working Environment in Windows OS
PDF
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
PDF
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
PDF
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
PDF
Operationalizing Machine Learning at Scale with Sameer Nori
PPTX
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
PDF
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
PDF
Scaling Machine Learning To Billions Of Parameters
PDF
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Apache Spark Performance is too hard. Let's make it easier
Lessons learned from scaling YARN to 40K machines in a multi tenancy environment
[EUC2016] DockerCap: a software-level power capping orchestrator for Docker c...
Solve it Differently with Reactive Programming
Bringing complex event processing to Spark streaming
Re-Architecting Spark For Performance Understandability
Hambug R Meetup - Intro to H2O
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Spark Working Environment in Windows OS
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
Big Data Technology on Red Hat Enterprise Linux: OpenJDK vs. Oracle JDK
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Operationalizing Machine Learning at Scale with Sameer Nori
Dissecting Open Source Cloud Evolution: An OpenStack Case Study
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Scaling Machine Learning To Billions Of Parameters
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Ad

Similar to Ecss des (20)

PPTX
Scientific
PDF
Scientific Applications of The Data Distribution Service
PDF
Marco Cattaneo "Event data processing in LHCb"
PDF
HPC Workbench Presentation
PPTX
Ogce Workflow Suite Tg09
PDF
Lean Software Production and Qualification Infrastructures
PPT
Session 46 - Principles of workflow management and execution
PDF
Bern.jb
PDF
Stan_Wang_Resume
PDF
Workflowsim escience12
PDF
2011 07-27 ecoop innovation network
PPT
Blowing up the Box--the Emergence of the Planetary Computer
PDF
FireWorks workflow software
PDF
Shaping the Future of Automatic Programming
PDF
Overview of Scientific Workflows - Why Use Them?
PDF
eXo Software Factory Overview
PDF
Partitioning CCGrid 2012
PDF
Psi cgl test_auto_casestudy_v01
PDF
Eclipse Con 2010 PTP
PPTX
Providing fault tolerance in extreme scale parallel applications
Scientific
Scientific Applications of The Data Distribution Service
Marco Cattaneo "Event data processing in LHCb"
HPC Workbench Presentation
Ogce Workflow Suite Tg09
Lean Software Production and Qualification Infrastructures
Session 46 - Principles of workflow management and execution
Bern.jb
Stan_Wang_Resume
Workflowsim escience12
2011 07-27 ecoop innovation network
Blowing up the Box--the Emergence of the Planetary Computer
FireWorks workflow software
Shaping the Future of Automatic Programming
Overview of Scientific Workflows - Why Use Them?
eXo Software Factory Overview
Partitioning CCGrid 2012
Psi cgl test_auto_casestudy_v01
Eclipse Con 2010 PTP
Providing fault tolerance in extreme scale parallel applications
Ad

Ecss des

  • 1. Efficient Production of Synthetic Skies for the Dark Energy Survey Raminder Singh Science Gateways Group Indiana University, Bloomington. ramifnu@iu.edu
  • 2. Background & Explanation • First project to combine four different statistical methods of dark matter and dark energy. • Baryon acoustic oscillations in the matter power spectrum. • Abundance and spatial distribution of galaxy groups and clusters. • Weak gravitational lensing by large-scale structure. • Type Ia supernovae - are quasi-independent. • 5000-square degree survey of cosmic structure traced by galaxies • Simulation Working Group is generating simulations for galaxy yields in various cosmologies • Analysis of these simulated catalogs offers a quality assurance capability for cosmological and astrophysical analysis of upcoming DES telescope data.
  • 3. Blind Cosmology Challenge (BCC) • N-Body simulation to simulate the dark matter • Post-processed using an empirical approach to link galaxy properties to dark matter structures (halos) • Gravitational lensing shear (new Spherical Harmonic Tree code) • ADDGALS methodology, empirical tuning. Figure: Processing steps to build a synthetic galaxy catalog. Xbaya workflow currently controls the top- most element (N-body simulations) which consists of methods to sample a cosmological power spectrum (ps), generating an initial set of particles (ic) and evolving the particles forward in time with Gadget (N- body). The remaining methods are run manually on distributed resources.
  • 4. Project Plan • Project requested for ECSS support to automate and optimize processing on XSEDE resources. • ECSS staff, Dora Cai worked with the group on the work plan. • Work plan was discussed and a timeline were given to each task • Work plan called for porting all codes to TACC-ranger
  • 5. Main Goals • Automated (re)submission of jobs to improve manual inefficiencies  Walltime limits of TACC Ranger long queue is 48 hrs so jobs need restart.  Multiple simulations can run in parallel • Automatic data archival  Data need to be archived on Ranch  Data need to moved to SLAC for post processing • Provenance • User portal
  • 6. Development Roadmap XSEDE Quarterly Developments Lesson Learned July - Sept 2011 • Deployed/Tested the N-Body simulation • Added module loading code on Ranger. support on Ranger using • Developed/Tested individual codes using Gram. RSL parameters Apache Airavata. mod, mod_del • Developed BCC Parameter Maker for workflow configuration Sept - Dec 2011 • Enhanced Parameter maker script to • Queue walltime 48 hr accept inputs from Airavata Workflow. limits of Ranger • Full workflow run for Single N-Body smaller simulation Jan - Mar 2012 • Medium scale testing • Globus online client • Restart capability using Do-While library is required to contracts initiate transfers from • Running multiple simulations in parallel workflow for production runs. • Evaluated file movement using GridFTP, bbcp, Globus Online.
  • 7. Development Roadmap XSEDE Quarterly Developments Learn Learned April - June2012 • Ran few more production simulations 2 • Luster issue with to in parallel many files created in a • Changed I/O routines to fix issues. single folder • Able to produce full simulation data for • Gram canceling jobs post processing after few hours in case of connection timeout with client July – Sept 2012 • Investigate the job canceling issue and • Globus team confirmed reported to Globus team that issue is not from • Changed COG API to add debug server side but client statements to debug issue • didn't give proper error • Ran more production boxes to debug • Migration to SDSC trestles • MPI library loading and RSL parameter resolved using Gram on Trestles Oct – Dec 2012 • Able to run 2 full 4 box simulations • Restart files were not using workflow. written properly • Able to resolve Job cancel issue for the because of some new latest run. exception
  • 8. Xbaya Workflow • Three large-volume realizations with 20483 particles using 1024 processors • One smaller volume of 14003 particles using 512 processors. • These 4 boxes need about 300,000 SUs • Each cosmology simulation entails roughly 56TB of data.
  • 9. Workflow Description • BCC Parameter Maker This initial setup code is written as a python script and prepares necessary configurations and parameter files for the workflow execution. • CAMB The CAMB (Code for Anisotropies in the Microwave Background) application computes the power spectrum of dark matter, which is necessary for generating the simulation initial conditions. • 2LPTic The Second-order Lagrangian Perturbation Theory initial conditions code (2LPTic) is programmed using Message Passing (MPI) C code that computes the initial conditions for the simulation from parameters and an input power spectrum generated by CAMB. • LGadget The LGadget simulation code is MPI based C code that uses a TreePM algorithm to evolve a gravitational N-body system. The outputs of this step are system state snapshot files, as well as lightcone files, and some properties of the matter distribution, including diagnostics such as total system energies and momenta. The total output from LGadget depends on resolution and the number of system snapshots stored, and approaches close to 10 TeraBytes for large DES simulation volumes.
  • 10. Production results • We have run four full size boxes with Airavata. The following table compares how time improvement is used by Airavata compared to manually submitting jobs. * ignores 1.5 day delay due to network connection error with job listener and subsequent gram5 job deletion
  • 11. Lessons learned • MPI libraries and job scripts can be different on different resources. User needs to experiment to learn. • User scratch policies can be different on different machines so it can talk some time to migrate between resources. • Migration of working codes from one machine to another can take weeks to months • Grid Services and client libraries need 1st class support XSEDE ticketing system is your Best Friend!
  • 12. Future Goals • Migrate to SDSC Trestles for next production run. • Group is planning to work with Apache Airavata for future extensions. • Implement intermediate start of workflow in case of failure based on Provenance information. • Post processing • Plan for TACC Stampede migration. • Currently we are using Globus Online GUI interface for file transfer but would like to integrate using API’s with the Workflow • Migrate post processing and quality assurance code on XSEDE and develop post processing workflow • Try to integrate Post processing steps on SLAC
  • 13. Team & Publication DES Simulation Working group August E. Evrard, Departments of Physics and Astronomy(Michigan) Brandon Erickson, grad student (Michigan) Risa Wechsler, asst. professor (Stanford/SLAC) Michael Busha, postdoc (Zurich) Matt Becker, grad student (Chicago) Andrey V. Kravtsov, Department of Astronomy and Astrophysics (Chicago) ECSS Suresh Marru, Science Gateways Group(Indiana) Marlon Pierce Science Gateways Group(Indiana) Lars Koesterke, TACC (Texas) Dora Cai, NCSA (Illinois) Publication: XSDEDE 12: Brandon M. S. Erickson, Raminderjeet Singh, August E. Evrard, Matthew R. Becker, Michael T. Busha, Andrey V. Kravtsov, Suresh Marru, Marlon Pierce, and Risa H. Wechsler. 2012. A high throughput workflow environment for cosmological simulations. In Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond (XSEDE '12). ACM, New York, NY, USA, , Article 34 , 8 pages. DOI=10.1145/2335755.2335830 http://doi.acm.org/10.1145/2335755.2335830

Editor's Notes

  • #3: Dark energy survey is the first project to combine four different statistical methods of dark matter and dark energy. It is a 5000 square degree survey of cosmic structures traced by galaxies. Simulation working group is focused on generating synthetic skies for dark energy survey. This synthetic data is to provide quality assurance for the astronomy pipelines and the galaxy catalogs.