SlideShare a Scribd company logo
2024 ESIP Summer Meeting
Accessing HDF Data in
the Cloud via OPeNDAP
Web Service
Kent Yang
Software Engineer/NASA EED-3 contractor
myang6@hdfgroup.org
GOVERNMENT RIGHTS NOTICE
This work was authored by employees of The HDF Group under Contract No. 80GSFC21CA001 with the National Aeronautics and Space
Administration. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United
States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to reproduce, prepare derivative works, distribute copies to
the public, and perform publicly and display publicly, or allow others to do so, for United States Government purposes. All other rights are
reserved by the copyright owner.
©2024 Raytheon Company. All rights reserved.
Topics Overview
● Accessing HDF* Data in the Cloud via dmrpp**
● Direct IO*** Performance Improvement
● Work in progress to access NASA HDF4 and HDF-EOS****2 files
*Hierarchical Data Format
** Dataset Metadata Response Plus Plus
*** Input Output
**** Earth Observing System
Direct IO Performance Improvement Concept
HDF5
File dmrpp File NetCDF NetCDF*
File
Decompress Compress
Hyrax
Core
Pass through the data Pass through the data
HDF5
File dmrpp File NetCDF NetCDF
File
Hyrax
Core
* Network Common Data Form
General Approach
Approach with Direct IO
Hyrax Server Response Time Speed-up With Direct IO
Product
Sample
File
File Size
(MB)
Response
Time without
Direct IO
(Seconds)
Response
Time with
Direct IO
(Seconds)
Speed-up in Response
Time by using Direct IO
GHRSST* 9 2.8 0.2 14 X
TROPOMI** 292 26.6 1.8 15 X
SSMI*** 1.4 0.5 0.3 1.7 X
*: Group for High Resolution Sea Surface Temperature
**: TROPOspheric Monitoring Instrument
***: Special Sensor Microwave Imager
Big Files With Direct IO
Product
Sample
File
File Size
(GB)
Response Time
with Direct IO
(Second)
Server Response Message
Without using Direct IO
Daymet 3.5 45 The maximum response time limit(165
seconds) is exceeded.
MODIS* Derived 4 65 Insufficient memory
CH4 Level 4 11 Direct IO feature is not
used because it doesn’t
contain any compressed
variable.
The maximum response time limit(165
seconds) is exceeded.
* Moderate Resolution Imaging Spectroradiometer
Facts for the Direct IO Feature
● Hyrax will use direct IO automatically for those cases when end users
request to obtain the whole array of the selected variable(s) and
those variable(s) are compressed.
● This process is entirely transparent to the end users.
● Direct IO doesn’t work for some old dmrpp files if they don’t contain
the key information needed for using the Direct IO feature. These
dmrpp files need to be regenerated to take advantage of the Direct IO
feature.
Direct IO Performance Improvement Summary
● Can greatly reduce server computation time
● Can greatly reduce server memory usage
● Use HDF5 direct chunk IO API*s
● The feature is in the current Hyrax release
* Application Programming Interface
Accessing HDF4 and HDF-EOS2 via dmrpp
● Map HDF4 to DMR*
● Access HDF4 via dmrpp
○ Not only handle data stored in chunking and contiguous layouts
○ Also need to handle data stored in linked blocks
○ Handle HDF-EOS2/HDF4 geolocation data
■ Not stored as HDF4 variables
■ Need to calculate them based on the metadata information
■ Save the data in a proper way
* Dataset Metadata Response
Current Status
● We can successfully map and access the sample NASA HDF4 and HDF-
EOS2 products via dmrpp.
● We are still working on a better way to store HDF-EOS2/HDF4
geolocation data.
● Panoply screenshots of variable Topography
○ The identical plots show the dmrpp module can successfully access this HDF4 file
AIRS* Local HDF4 file via dmrpp’s netCDF-4 file
Local HDF4 file netCDF-4 file via dmrpp
* Atmospheric Infrared Sounder
Thank you!
This work was supported by NASA/GSFC under
Raytheon Company contract number
80GSFC21CA001

More Related Content

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS

More from The HDF-EOS Tools and Information Center (20)

PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
PPTX
HDF for the Cloud - Serverless HDF
PPTX
HDF for the Cloud - New HDF Server Features
PPSX
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
PPTX
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
PPTX
HDF5 and Ecosystem: What Is New?
PPTX
PPTX
Leveraging the Cloud for HDF Software Testing
PPTX
Google Colaboratory for HDF-EOS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10
HDF for the Cloud - Serverless HDF
HDF for the Cloud - New HDF Server Features
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
HDF5 and Ecosystem: What Is New?
Leveraging the Cloud for HDF Software Testing
Google Colaboratory for HDF-EOS
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Modernizing your data center with Dell and AMD
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Empathic Computing: Creating Shared Understanding
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
NewMind AI Monthly Chronicles - July 2025
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Modernizing your data center with Dell and AMD
The AUB Centre for AI in Media Proposal.docx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Digital-Transformation-Roadmap-for-Companies.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Ad

Access HDF Data in the Cloud via OPeNDAP Web Service

  • 1. 2024 ESIP Summer Meeting Accessing HDF Data in the Cloud via OPeNDAP Web Service Kent Yang Software Engineer/NASA EED-3 contractor myang6@hdfgroup.org GOVERNMENT RIGHTS NOTICE This work was authored by employees of The HDF Group under Contract No. 80GSFC21CA001 with the National Aeronautics and Space Administration. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, or allow others to do so, for United States Government purposes. All other rights are reserved by the copyright owner. ©2024 Raytheon Company. All rights reserved.
  • 2. Topics Overview ● Accessing HDF* Data in the Cloud via dmrpp** ● Direct IO*** Performance Improvement ● Work in progress to access NASA HDF4 and HDF-EOS****2 files *Hierarchical Data Format ** Dataset Metadata Response Plus Plus *** Input Output **** Earth Observing System
  • 3. Direct IO Performance Improvement Concept HDF5 File dmrpp File NetCDF NetCDF* File Decompress Compress Hyrax Core Pass through the data Pass through the data HDF5 File dmrpp File NetCDF NetCDF File Hyrax Core * Network Common Data Form General Approach Approach with Direct IO
  • 4. Hyrax Server Response Time Speed-up With Direct IO Product Sample File File Size (MB) Response Time without Direct IO (Seconds) Response Time with Direct IO (Seconds) Speed-up in Response Time by using Direct IO GHRSST* 9 2.8 0.2 14 X TROPOMI** 292 26.6 1.8 15 X SSMI*** 1.4 0.5 0.3 1.7 X *: Group for High Resolution Sea Surface Temperature **: TROPOspheric Monitoring Instrument ***: Special Sensor Microwave Imager
  • 5. Big Files With Direct IO Product Sample File File Size (GB) Response Time with Direct IO (Second) Server Response Message Without using Direct IO Daymet 3.5 45 The maximum response time limit(165 seconds) is exceeded. MODIS* Derived 4 65 Insufficient memory CH4 Level 4 11 Direct IO feature is not used because it doesn’t contain any compressed variable. The maximum response time limit(165 seconds) is exceeded. * Moderate Resolution Imaging Spectroradiometer
  • 6. Facts for the Direct IO Feature ● Hyrax will use direct IO automatically for those cases when end users request to obtain the whole array of the selected variable(s) and those variable(s) are compressed. ● This process is entirely transparent to the end users. ● Direct IO doesn’t work for some old dmrpp files if they don’t contain the key information needed for using the Direct IO feature. These dmrpp files need to be regenerated to take advantage of the Direct IO feature.
  • 7. Direct IO Performance Improvement Summary ● Can greatly reduce server computation time ● Can greatly reduce server memory usage ● Use HDF5 direct chunk IO API*s ● The feature is in the current Hyrax release * Application Programming Interface
  • 8. Accessing HDF4 and HDF-EOS2 via dmrpp ● Map HDF4 to DMR* ● Access HDF4 via dmrpp ○ Not only handle data stored in chunking and contiguous layouts ○ Also need to handle data stored in linked blocks ○ Handle HDF-EOS2/HDF4 geolocation data ■ Not stored as HDF4 variables ■ Need to calculate them based on the metadata information ■ Save the data in a proper way * Dataset Metadata Response
  • 9. Current Status ● We can successfully map and access the sample NASA HDF4 and HDF- EOS2 products via dmrpp. ● We are still working on a better way to store HDF-EOS2/HDF4 geolocation data.
  • 10. ● Panoply screenshots of variable Topography ○ The identical plots show the dmrpp module can successfully access this HDF4 file AIRS* Local HDF4 file via dmrpp’s netCDF-4 file Local HDF4 file netCDF-4 file via dmrpp * Atmospheric Infrared Sounder
  • 12. This work was supported by NASA/GSFC under Raytheon Company contract number 80GSFC21CA001