SlideShare a Scribd company logo
Introduction to the
HACC Simulation Data Portal
Globus World 2019; Chicago, May 1, 2019
Katrin Heitmann (Argonne National Laboratory)
Based on: arXiv:1904.11966
Introduction
! In cosmology we study the origin, evolution, and make-up of
the Universe
! Many unsolved questions:
○ What is the nature of dark energy and dark matter, making up 95% of the
energy-matter budget of our Universe?
○ What is the mass of the lightest particle in the Universe, the neutrino?
○ How can we learn more about the very first moments of the Universe?
! Upcoming cosmological surveys try to answer these
questions and rely on detailed, complex simulations
○ Simulations are carried out and analyzed on the largest supercomputers
available world-wide
○ Cosmological simulations generate large amounts of data (PBs) to capture
the evolution of the Universe faithfully
○ Given the resources required for these simulations, it is crucial to share
them with the community to enable the best possible science outcome HACC/Galacticus/GalSim
Hubble Ultra Deep Field
NASA
What is needed ...
A large-scale effort that
provides easy access to a
range of simulation products to
the world’s cosmologists as
well as analysis capabilities to
established survey
collaborations
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Temporary storage,
expires with allocation,
only collaborators on the
project have direct
access
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Globus
Online
Petrel
O(1 PB, 100TB to start)
Datasets
• Portal
• Globus
User community via web and
community-specific clients
Storage
O( 50PB total)
Simulation
(HPC allocations, e.g.,
INCITE, ALCC)
Analysis User community via web and
community-specific clients
simulation
job description
analysis
job description
Public access to cosmological data and computational support for collaborations
CooleyTheta (10PF)
job submission/adaptation layer
Datasets
Collaboration-installed Web/
Data Interfaces
• LSST DM Butler
• Jupyter
• PDACS (Galaxy)
• DESCQA
• Visualization
• Databases
• Globus
• Workflows
Globus
Online
Petrel
O(1 PB, 100TB to start)
• Portal
• Globus
ALCF-hosted
Collaboration-controlled Resources
Physical/Virtual Machine(s)
Phoenix
In collaboration with Tom Uram, Mike Papka, Ian Foster
What exists ...
• Petrel and Phoenix
• Simulations
• First version of web portal
using Globus
! Petrel: Data Management and
Sharing Pilot, hosted at Argonne
! 1.7PB parallel filesystem
! Embedded in Argonne’s
100+Gbps network fabric to allow
high-speed data transfers
! Web and API access via Globus
! Federated login
! Self-managed by PIs
! https://press3.mcs.anl.gov/petrel/
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
! Webportal for easy access to
simulations
! Currently: ~ 82.5 TB in our
project covering three
simulation projects
! Step 0: Register with Globus
! Step 1: Select simulation
project
! Step 2: Select data products,
information about data size
available
! Step 3: Transfer with Globus to
endpoint of your choice
“The purpose of computing is insight not numbers”
- Richard Hamming

More Related Content

PPT
Cern uses cloud for next challenge
PPTX
Project Matsu: Elastic Clouds for Disaster Relief
PPTX
Bioclouds CAMDA (Robert Grossman) 09-v9p
PDF
Big Data Solutions for the Climate Community
PDF
Updates on the Fake Object Pipeline for HSC Survey
PDF
PIC Tier-1 (LHCP Conference / Barcelona)
ODP
Aurora Dublin
PDF
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015
Cern uses cloud for next challenge
Project Matsu: Elastic Clouds for Disaster Relief
Bioclouds CAMDA (Robert Grossman) 09-v9p
Big Data Solutions for the Climate Community
Updates on the Fake Object Pipeline for HSC Survey
PIC Tier-1 (LHCP Conference / Barcelona)
Aurora Dublin
The Directions Pipeline at Mapbox - AWS Meetup Berlin June 2015

What's hot (20)

PDF
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
PPTX
OCC Overview OMG Clouds Meeting 07-13-09 v3
PPTX
The next generation of the Montage image mosaic engine
PPTX
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
PDF
Using parallel hierarchical clustering to
PPTX
Coding the Continuum
PPT
Solving Network Throughput Problems at the Diamond Light Source
PPTX
Storm: a distributed ,fault tolerant ,real time computation
PPTX
Faster Workflows, Faster
PDF
Q4 2016 GeoTrellis Presentation
PDF
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
PDF
Summingbird: Streaming Portable, MapReduce
PPTX
Round Table Introduction: Analytics on 100 TB+ catalogs
PDF
Deep Learning in Deep Space
PPTX
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
PPT
Many Task Applications for Grids and Supercomputers
PPT
Lec 17 heap data structure
PDF
The Next Light Wave: Why Too Much Light is An Issue
PPTX
Research in the Cloud
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
OCC Overview OMG Clouds Meeting 07-13-09 v3
The next generation of the Montage image mosaic engine
OWL reasoning with WebPIE: calculating the closer of 100 billion triples
Using parallel hierarchical clustering to
Coding the Continuum
Solving Network Throughput Problems at the Diamond Light Source
Storm: a distributed ,fault tolerant ,real time computation
Faster Workflows, Faster
Q4 2016 GeoTrellis Presentation
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
Summingbird: Streaming Portable, MapReduce
Round Table Introduction: Analytics on 100 TB+ catalogs
Deep Learning in Deep Space
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
Many Task Applications for Grids and Supercomputers
Lec 17 heap data structure
The Next Light Wave: Why Too Much Light is An Issue
Research in the Cloud
Ad

Similar to Introducing the HACC Simulation Data Portal (20)

PPTX
Data Automation at Light Sources
PPTX
Toward a National Research Platform
PPTX
Petrel: A Programmatically Accessible Research Data Service
PPTX
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
PDF
HPC Cluster Computing from 64 to 156,000 Cores 
PDF
Preservation And Reuse In High Energy Physics Salvatore Mele
PDF
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
PPTX
re:Invent 2013-foster-madduri
PPT
Toward a Global Interactive Earth Observing Cyberinfrastructure
PPTX
Accelerating Discovery via Science Services
PPTX
Big Process for Big Data @ NASA
PPTX
The Earth System Grid Federation: Origins, Current State, Evolution
PPTX
Larry Smarr - NRP Application Drivers
PDF
Accelerating Time to Science: Transforming Research in the Cloud
PPTX
National Research Platform: Application Drivers
PPTX
Scaling collaborative data science with Globus and Jupyter
PPTX
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
PDF
afternoon3.pdf
PPTX
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
PPTX
Scaling People, Not Just Systems, to Take On Big Data Challenges
Data Automation at Light Sources
Toward a National Research Platform
Petrel: A Programmatically Accessible Research Data Service
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
HPC Cluster Computing from 64 to 156,000 Cores 
Preservation And Reuse In High Energy Physics Salvatore Mele
Deep Learning on Apache Spark at CERN’s Large Hadron Collider with Intel Tech...
re:Invent 2013-foster-madduri
Toward a Global Interactive Earth Observing Cyberinfrastructure
Accelerating Discovery via Science Services
Big Process for Big Data @ NASA
The Earth System Grid Federation: Origins, Current State, Evolution
Larry Smarr - NRP Application Drivers
Accelerating Time to Science: Transforming Research in the Cloud
National Research Platform: Application Drivers
Scaling collaborative data science with Globus and Jupyter
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
afternoon3.pdf
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Scaling People, Not Just Systems, to Take On Big Data Challenges
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
GlobusWorld 2024 Opening Keynote session
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Monthly Chronicles - July 2025
Diabetes mellitus diagnosis method based random forest with bat algorithm
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Introducing the HACC Simulation Data Portal

  • 1. Introduction to the HACC Simulation Data Portal Globus World 2019; Chicago, May 1, 2019 Katrin Heitmann (Argonne National Laboratory) Based on: arXiv:1904.11966
  • 2. Introduction ! In cosmology we study the origin, evolution, and make-up of the Universe ! Many unsolved questions: ○ What is the nature of dark energy and dark matter, making up 95% of the energy-matter budget of our Universe? ○ What is the mass of the lightest particle in the Universe, the neutrino? ○ How can we learn more about the very first moments of the Universe? ! Upcoming cosmological surveys try to answer these questions and rely on detailed, complex simulations ○ Simulations are carried out and analyzed on the largest supercomputers available world-wide ○ Cosmological simulations generate large amounts of data (PBs) to capture the evolution of the Universe faithfully ○ Given the resources required for these simulations, it is crucial to share them with the community to enable the best possible science outcome HACC/Galacticus/GalSim Hubble Ultra Deep Field NASA
  • 3. What is needed ... A large-scale effort that provides easy access to a range of simulation products to the world’s cosmologists as well as analysis capabilities to established survey collaborations
  • 4. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis User community via web and community-specific clients simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Datasets Collaboration-installed Web/ Data Interfaces • LSST DM Butler • Jupyter • PDACS (Galaxy) • DESCQA • Visualization • Databases • Globus • Workflows Globus Online Petrel O(1 PB, 100TB to start) • Portal • Globus ALCF-hosted Collaboration-controlled Resources Physical/Virtual Machine(s) Phoenix In collaboration with Tom Uram, Mike Papka, Ian Foster
  • 5. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer
  • 6. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Temporary storage, expires with allocation, only collaborators on the project have direct access
  • 7. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Globus Online Petrel O(1 PB, 100TB to start) Datasets
  • 8. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Globus Online Petrel O(1 PB, 100TB to start) Datasets • Portal • Globus User community via web and community-specific clients
  • 9. Storage O( 50PB total) Simulation (HPC allocations, e.g., INCITE, ALCC) Analysis User community via web and community-specific clients simulation job description analysis job description Public access to cosmological data and computational support for collaborations CooleyTheta (10PF) job submission/adaptation layer Datasets Collaboration-installed Web/ Data Interfaces • LSST DM Butler • Jupyter • PDACS (Galaxy) • DESCQA • Visualization • Databases • Globus • Workflows Globus Online Petrel O(1 PB, 100TB to start) • Portal • Globus ALCF-hosted Collaboration-controlled Resources Physical/Virtual Machine(s) Phoenix In collaboration with Tom Uram, Mike Papka, Ian Foster
  • 10. What exists ... • Petrel and Phoenix • Simulations • First version of web portal using Globus
  • 11. ! Petrel: Data Management and Sharing Pilot, hosted at Argonne ! 1.7PB parallel filesystem ! Embedded in Argonne’s 100+Gbps network fabric to allow high-speed data transfers ! Web and API access via Globus ! Federated login ! Self-managed by PIs ! https://press3.mcs.anl.gov/petrel/
  • 12. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 13. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 14. ! Webportal for easy access to simulations ! Currently: ~ 82.5 TB in our project covering three simulation projects ! Step 0: Register with Globus ! Step 1: Select simulation project ! Step 2: Select data products, information about data size available ! Step 3: Transfer with Globus to endpoint of your choice
  • 15. “The purpose of computing is insight not numbers” - Richard Hamming