SlideShare a Scribd company logo
Building the Next Generation
Earth System Grid Federation (ESGF2)
Forrest M. Hoffman (ORNL), Ian Foster (ANL), Sasha Ames (LLNL)
Rachana Ananthakrishnan, Jason Boutte, Nathan Collier, Scott Collis, Carlos
Downie, Robert Jacob, Jitendra Kumar, Giri Prakash, Sarat Sreepathi, and Min Xu
What is the Earth System Grid Federation?
● The Earth System Grid Federation
(ESGF) is a globally distributed
peer-to-peer network of data servers
using a common set of protocols and
interfaces to archive and distribute
Earth system model (ESM) output
● ESM output data are used by scientists
all over the world to investigate
consequences of possible climate
change scenarios and the resulting
Earth system feedbacks
● The United Nations’ Intergovernmental Panel
on Climate Change (IPCC) Sixth Assessment
Report from Working Group I was released on
Monday, August 9, 2021
● All of the climate and Earth system model
simulation output underpinning this report was
produced by modeling centers participating in
the World Climate Research Programme’s
(WCRP’s) sixth phase of the Coupled Model
Intercomparison Project (CMIP6)
● Nearly all of that model output was stored in
and distributed to researcher via ESGF
● Data are about the future of life on Earth!
IPCCAR6 Released
ESGF Holdings are Large and Growing
● CMIP5 totals >5 PB
● CMIP6 totals >20 PB
● We expect CMIP7
output, including high
resolutions simulations
and more ensembles,
to total >100 PB
● We plan to expand
Federation holdings by
adding other Earth
science data projects
As of August 22, 2021
A New Consortium Project in the USA
● New team from Oak Ridge National Laboratory, Argonne National
Laboratory, and Lawrence Livermore National Laboratory proposed to
modernize the data backplane based on the Globus platform
● ESGF2 proposal was reviewed by panel of 8 scientists on August 30–31, 2021,
and was selected for funding by the US Department of Energy in September
● In collaboration with the ESGF Executive Committee, we will develop and
deploy a new architecture based on the Future Architecture Roadmap
● In addition, we will develop new data discovery tools and data access
interfaces, server-side computing (subsetting & summarizing), and user
computing (Kubernetes & JupyterHub) with improved user & system metrics
● We will add a Resource & Project Liaison group and a Science, User & Facility
Advisory Board; hold outreach activities; and offer a help desk/user support
DOE’s Current
Earth System
Grid Federation
● Primary server
at LLNL
● Replicating
data from the
global
Federation
● Independent
data node at
ANL
DOE’s Next
Generation
Earth System
Grid Federation
● Co-located at
DOE’s major
computing
facilities
● Replicating
data from the
global
Federation
● Providing
cloud indexing,
automated
migration, and
tape archiving
Design and implementation principles
● Open architecture and protocols
○ Enable substitution of alternative implementations
● Leverage highly available and scalable central services from Globus
○ Reduce complexity, increase reliability, provide economies of scale
● Use proven, modern security technologies and practices
○ Integrated access control; protect against attacks and intrusions
● Use case approach to design, implementation, and evaluation
○ Ensure that solutions meet real user needs
● Integrated instrumentation
○ Metrics drive data management, data access features, capability development
● Focus on performance to deal with big data
○ High-speed data transfer, search, server-side processing
Enabling a new level of research productivity
Logging in with her institutional credentials,
Samantha is presented with new data, code, and
papers relevant to her current research. Intrigued by a
new report on extreme precipitation events, she
examines a Jupyter notebook that implements the
method used. Wondering how this method would work
with higher-resolution E3SM data, she quickly locates
required datasets and runs the notebook on a
subset. Results are promising, so she shares them
with collaborators via ESGF2 federated storage, and
they agree that a larger ensemble analysis is called for.
ESGF2 confirms that the full ensemble data are
available at OLCF, so they submit a request to execute
the analysis there. Within 24 hours, results have been
published to ESGF2 for broader consumption, along
with the notebook used to produce and validate the
results.
Flood risk increases with
water availability
Building the Next Generation Earth System Grid Federation (ESGF2)
ESnet Global Connectivity
Global ESnet interconnectivity—including high
speed connections to London, Amsterdam, and
Geneva—will enable rapid data replication
across most of the Federation data nodes
ESGF2 will make use of the
high bandwidth between DOE
labs and HPC centers
Data will be automatically
migrated and cached
across ORNL, ANL, and
LLNL sites
An ESnet representative is part of our Resource & Project Liaisons group
Data Discovery Platform: Architecture
Governance
Outreach Activities
● Organize Webinars, Tutorials, and ESGF2 Bootcamps
○ Data management lessons learned
○ Ingest best practices
○ Data discovery and access
● Hackathons and Workshops
○ Data standards
○ Data node deployment
○ User compute resources
○ Hold at large relevant conferences, e.g.,
AGU Fall Meeting, EGU, and AMS Annual Meeting
● Organize and host an annual
ESGF Developer and User Conference
User Support
● Support traditionally given via single email list
○ Volunteer basis from collaborating projects
● Need to formalize process for a “Helpdesk”
○ Introduce ticketing system for user inquires
■ e.g., ServiceNow or RT, leverage site resources
○ Assign dedicated support staff (need to triage)
○ Get help from Federation partners for responses
○ Maintain quality documentation for everything
user-facing
○ Walkthroughs for GUI-based tools / webapps
○ Explore additional tools to provide support
■ Github Issues (when associated with software “bugs”)
■ Discourse - need subscription
■ Slack (invite single channel guests as needed)
ESGF Failsafe Data Replication
● In the US, LLNL operates the primary ESGF node, which replicates much of
the CMIP6 and related model output from around the globe
● Since the data at LLNL are contained only on spinning disk, we decided to
replicate the entire ~7.5 PB collection of data to Argonne National Laboratory
(ANL) and Oak Ridge National Laboratory (ORNL)
● Solution: Use Globus to transfer all the data over ESnet
● We used custom Globus scripting (thanks to Lukasz Lacinski), ESnet network
monitoring and diagnostics (thanks to Eli Dart), DTN and GPFS optimized
configurations (thanks to Cameron Harr and others), and debugging and
problem-solving (thanks to Sasha Ames, Lee Liming, and others)
https://dashboard.globus.org/esgf As of March 10, 2022
1.5 GB/s
4 to 6 GB/s
https://dashboard.globus.org/esgf As of May 4, 2022
1.5 GB/s
4 to 6 GB/s
7.5 PB transferred between mid-Feb and May 4
17,347,671 directories and 28,907,532 files
Cumulative Data Transferred Over Time
Transfer Rates Over Time
Summary
● The next generation Earth System Grid Federation (ESGF2)
○ Will be designed for an order of magnitude increase in data sizes
○ Will be highly available, scalable, and fast
○ Will automatically migrate data as needed
○ Will have improved data discovery and sharing tools
○ Will offer server-side computing for derived data
○ Will offer user computing capabilities (e.g., JupyterHub/JupyterLab) near the data
● The Globus platform is expected to provide many of the central services of the
ESGF2 data backplane
● Globus is already playing a key role in data management and data distribution
● We used Globus to make two redundant copies of the 7.5 PB of ESGF data via
ESnet in less than 3 months

More Related Content

PDF
Introduction to GCP
PDF
Google Anthos - Azure Stack - AWS Outposts :Comparison
PDF
Apache Kafka® Use Cases for Financial Services
PPTX
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
PDF
Deploy resources on Azure using IaC (Azure Terraform)
PDF
The A-Z of Data: Introduction to MLOps
PDF
Maximizing SD-WAN Architecture with Service Chaining - VeloCloud
PDF
What is data engineering?
Introduction to GCP
Google Anthos - Azure Stack - AWS Outposts :Comparison
Apache Kafka® Use Cases for Financial Services
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
Deploy resources on Azure using IaC (Azure Terraform)
The A-Z of Data: Introduction to MLOps
Maximizing SD-WAN Architecture with Service Chaining - VeloCloud
What is data engineering?

What's hot (20)

PPTX
Microsoft Fabric.pptx
PPTX
Apache Spark & Scala
PPTX
Azure data platform overview
PDF
Introduction to Azure Data Factory
PDF
Cloud Cost Optimization Whitepaper
PDF
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
PDF
LogicMonitor: An Overview
PPTX
Vault Open Source vs Enterprise v2
PDF
Red Hat OpenShift - a foundation for successful digital transformation
PPTX
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
PDF
Hive spark-s3acommitter-hbase-nfs
PDF
Monitoring and observability
PPTX
How to apply machine learning into your CI/CD pipeline
PDF
Using Databricks as an Analysis Platform
PPTX
Cluster computing pptl (2)
PDF
Vertex AI: Pipelines for your MLOps workflows
PDF
Making the Case for Integration Platform as a Service (iPaaS)
PPTX
Cloud migration
PPTX
Cloud Computing
PDF
DataOps: An Agile Method for Data-Driven Organizations
Microsoft Fabric.pptx
Apache Spark & Scala
Azure data platform overview
Introduction to Azure Data Factory
Cloud Cost Optimization Whitepaper
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
LogicMonitor: An Overview
Vault Open Source vs Enterprise v2
Red Hat OpenShift - a foundation for successful digital transformation
Azure Cloud Adoption Framework + Governance - Sana Khan and Jay Kumar
Hive spark-s3acommitter-hbase-nfs
Monitoring and observability
How to apply machine learning into your CI/CD pipeline
Using Databricks as an Analysis Platform
Cluster computing pptl (2)
Vertex AI: Pipelines for your MLOps workflows
Making the Case for Integration Platform as a Service (iPaaS)
Cloud migration
Cloud Computing
DataOps: An Agile Method for Data-Driven Organizations
Ad

Similar to Building the Next Generation Earth System Grid Federation (ESGF2) (20)

PPTX
The Earth System Grid Federation: Origins, Current State, Evolution
PDF
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
PDF
ApacheCon NA 2013
PDF
Climate Science Flows - Enabling Petabyte-Scale Climate Analysis with the Ear...
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PPT
RFCs for HDF5 and HDF-EOS5 Status Update
PPTX
Past, present and future of advanced computing for data-driven science
PPTX
Volunteer Crowd Computing and Federated Cloud developments
PDF
OThe Open Science Grid: Concepts and Patterns Ruth Pordes, Mine Altunay, Bria...
PPTX
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
PDF
OGF Introductory Overview - FAS* 2014
PDF
Community Webinar: Tune up for AGU
PPT
[.ppt]
PPT
EUDAT
PPTX
Ogf27 Ligo
PPTX
Gab Abramowitz_The e-MAST data-model interface
PPTX
EOSC-hub & Geohazards TEP
PPTX
Data Facilties Workshop - Panel on Global Data Sharing Exemplars
PDF
THE OPEN SCIENCE GRID Ruth Pordes
PDF
OGF Introductory Overview - OGF 44 at EGI Conference 2015
The Earth System Grid Federation: Origins, Current State, Evolution
CLIM Program: Remote Sensing Workshop, The Earth System Grid Federation as a ...
ApacheCon NA 2013
Climate Science Flows - Enabling Petabyte-Scale Climate Analysis with the Ear...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
RFCs for HDF5 and HDF-EOS5 Status Update
Past, present and future of advanced computing for data-driven science
Volunteer Crowd Computing and Federated Cloud developments
OThe Open Science Grid: Concepts and Patterns Ruth Pordes, Mine Altunay, Bria...
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
OGF Introductory Overview - FAS* 2014
Community Webinar: Tune up for AGU
[.ppt]
EUDAT
Ogf27 Ligo
Gab Abramowitz_The e-MAST data-model interface
EOSC-hub & Geohazards TEP
Data Facilties Workshop - Panel on Global Data Sharing Exemplars
THE OPEN SCIENCE GRID Ruth Pordes
OGF Introductory Overview - OGF 44 at EGI Conference 2015
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
PDF
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
GlobusWorld 2024 Opening Keynote session
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap
Innovating Inference at Exascale - Remote Triggering of Large Language Models...

Recently uploaded (20)

PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
Cost to Outsource Software Development in 2025
PPTX
Monitoring Stack: Grafana, Loki & Promtail
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
PDF
Download FL Studio Crack Latest version 2025 ?
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
17 Powerful Integrations Your Next-Gen MLM Software Needs
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
Autodesk AutoCAD Crack Free Download 2025
wealthsignaloriginal-com-DS-text-... (1).pdf
Cost to Outsource Software Development in 2025
Monitoring Stack: Grafana, Loki & Promtail
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
iTop VPN Crack Latest Version Full Key 2025
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Complete Guide to Website Development in Malaysia for SMEs
Design an Analysis of Algorithms I-SECS-1021-03
history of c programming in notes for students .pptx
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
Download FL Studio Crack Latest version 2025 ?
Design an Analysis of Algorithms II-SECS-1021-03
17 Powerful Integrations Your Next-Gen MLM Software Needs
AutoCAD Professional Crack 2025 With License Key
CHAPTER 2 - PM Management and IT Context
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Odoo Companies in India – Driving Business Transformation.pdf
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Designing Intelligence for the Shop Floor.pdf
Autodesk AutoCAD Crack Free Download 2025

Building the Next Generation Earth System Grid Federation (ESGF2)

  • 1. Building the Next Generation Earth System Grid Federation (ESGF2) Forrest M. Hoffman (ORNL), Ian Foster (ANL), Sasha Ames (LLNL) Rachana Ananthakrishnan, Jason Boutte, Nathan Collier, Scott Collis, Carlos Downie, Robert Jacob, Jitendra Kumar, Giri Prakash, Sarat Sreepathi, and Min Xu
  • 2. What is the Earth System Grid Federation? ● The Earth System Grid Federation (ESGF) is a globally distributed peer-to-peer network of data servers using a common set of protocols and interfaces to archive and distribute Earth system model (ESM) output ● ESM output data are used by scientists all over the world to investigate consequences of possible climate change scenarios and the resulting Earth system feedbacks
  • 3. ● The United Nations’ Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report from Working Group I was released on Monday, August 9, 2021 ● All of the climate and Earth system model simulation output underpinning this report was produced by modeling centers participating in the World Climate Research Programme’s (WCRP’s) sixth phase of the Coupled Model Intercomparison Project (CMIP6) ● Nearly all of that model output was stored in and distributed to researcher via ESGF ● Data are about the future of life on Earth! IPCCAR6 Released
  • 4. ESGF Holdings are Large and Growing ● CMIP5 totals >5 PB ● CMIP6 totals >20 PB ● We expect CMIP7 output, including high resolutions simulations and more ensembles, to total >100 PB ● We plan to expand Federation holdings by adding other Earth science data projects As of August 22, 2021
  • 5. A New Consortium Project in the USA ● New team from Oak Ridge National Laboratory, Argonne National Laboratory, and Lawrence Livermore National Laboratory proposed to modernize the data backplane based on the Globus platform ● ESGF2 proposal was reviewed by panel of 8 scientists on August 30–31, 2021, and was selected for funding by the US Department of Energy in September ● In collaboration with the ESGF Executive Committee, we will develop and deploy a new architecture based on the Future Architecture Roadmap ● In addition, we will develop new data discovery tools and data access interfaces, server-side computing (subsetting & summarizing), and user computing (Kubernetes & JupyterHub) with improved user & system metrics ● We will add a Resource & Project Liaison group and a Science, User & Facility Advisory Board; hold outreach activities; and offer a help desk/user support
  • 6. DOE’s Current Earth System Grid Federation ● Primary server at LLNL ● Replicating data from the global Federation ● Independent data node at ANL
  • 7. DOE’s Next Generation Earth System Grid Federation ● Co-located at DOE’s major computing facilities ● Replicating data from the global Federation ● Providing cloud indexing, automated migration, and tape archiving
  • 8. Design and implementation principles ● Open architecture and protocols ○ Enable substitution of alternative implementations ● Leverage highly available and scalable central services from Globus ○ Reduce complexity, increase reliability, provide economies of scale ● Use proven, modern security technologies and practices ○ Integrated access control; protect against attacks and intrusions ● Use case approach to design, implementation, and evaluation ○ Ensure that solutions meet real user needs ● Integrated instrumentation ○ Metrics drive data management, data access features, capability development ● Focus on performance to deal with big data ○ High-speed data transfer, search, server-side processing
  • 9. Enabling a new level of research productivity Logging in with her institutional credentials, Samantha is presented with new data, code, and papers relevant to her current research. Intrigued by a new report on extreme precipitation events, she examines a Jupyter notebook that implements the method used. Wondering how this method would work with higher-resolution E3SM data, she quickly locates required datasets and runs the notebook on a subset. Results are promising, so she shares them with collaborators via ESGF2 federated storage, and they agree that a larger ensemble analysis is called for. ESGF2 confirms that the full ensemble data are available at OLCF, so they submit a request to execute the analysis there. Within 24 hours, results have been published to ESGF2 for broader consumption, along with the notebook used to produce and validate the results. Flood risk increases with water availability
  • 11. ESnet Global Connectivity Global ESnet interconnectivity—including high speed connections to London, Amsterdam, and Geneva—will enable rapid data replication across most of the Federation data nodes ESGF2 will make use of the high bandwidth between DOE labs and HPC centers Data will be automatically migrated and cached across ORNL, ANL, and LLNL sites An ESnet representative is part of our Resource & Project Liaisons group
  • 12. Data Discovery Platform: Architecture
  • 14. Outreach Activities ● Organize Webinars, Tutorials, and ESGF2 Bootcamps ○ Data management lessons learned ○ Ingest best practices ○ Data discovery and access ● Hackathons and Workshops ○ Data standards ○ Data node deployment ○ User compute resources ○ Hold at large relevant conferences, e.g., AGU Fall Meeting, EGU, and AMS Annual Meeting ● Organize and host an annual ESGF Developer and User Conference
  • 15. User Support ● Support traditionally given via single email list ○ Volunteer basis from collaborating projects ● Need to formalize process for a “Helpdesk” ○ Introduce ticketing system for user inquires ■ e.g., ServiceNow or RT, leverage site resources ○ Assign dedicated support staff (need to triage) ○ Get help from Federation partners for responses ○ Maintain quality documentation for everything user-facing ○ Walkthroughs for GUI-based tools / webapps ○ Explore additional tools to provide support ■ Github Issues (when associated with software “bugs”) ■ Discourse - need subscription ■ Slack (invite single channel guests as needed)
  • 16. ESGF Failsafe Data Replication ● In the US, LLNL operates the primary ESGF node, which replicates much of the CMIP6 and related model output from around the globe ● Since the data at LLNL are contained only on spinning disk, we decided to replicate the entire ~7.5 PB collection of data to Argonne National Laboratory (ANL) and Oak Ridge National Laboratory (ORNL) ● Solution: Use Globus to transfer all the data over ESnet ● We used custom Globus scripting (thanks to Lukasz Lacinski), ESnet network monitoring and diagnostics (thanks to Eli Dart), DTN and GPFS optimized configurations (thanks to Cameron Harr and others), and debugging and problem-solving (thanks to Sasha Ames, Lee Liming, and others)
  • 17. https://dashboard.globus.org/esgf As of March 10, 2022 1.5 GB/s 4 to 6 GB/s
  • 18. https://dashboard.globus.org/esgf As of May 4, 2022 1.5 GB/s 4 to 6 GB/s 7.5 PB transferred between mid-Feb and May 4 17,347,671 directories and 28,907,532 files
  • 21. Summary ● The next generation Earth System Grid Federation (ESGF2) ○ Will be designed for an order of magnitude increase in data sizes ○ Will be highly available, scalable, and fast ○ Will automatically migrate data as needed ○ Will have improved data discovery and sharing tools ○ Will offer server-side computing for derived data ○ Will offer user computing capabilities (e.g., JupyterHub/JupyterLab) near the data ● The Globus platform is expected to provide many of the central services of the ESGF2 data backplane ● Globus is already playing a key role in data management and data distribution ● We used Globus to make two redundant copies of the 7.5 PB of ESGF data via ESnet in less than 3 months