SlideShare a Scribd company logo
SESIP-0722-KY
HDF5 OPeNDAP Handler Updates,
and Performance Discussion
2022 ESIP Summer Meeting
This work was supported by NASA/GSFC under Raytheon Technologies contract number 80GSFC21CA001.
This document does not contain technology or Technical Data controlled under either the U.S. International Traffic
in Arms Regulations or the U.S. Export Administration Regulations.
Kent Yang
Software Engineer/NASA EED-3 contractor
myang6@hdfgroup.org
SESIP-0722-KY
2
• 2001: A prototype of HDF5 data handler
– HDF5 to DAP***2: Default option
• 2008: Handler in production
– Climate and Forecast(CF) option:
• Translate HDF5 metadata to follow CF
• 2008-2018: Significant improvement
– Still HDF5 to DAP2
HDF*5 OPeNDAP** Handler History
* Hierarchical Data Format
** Open-source Project for a Network Data Access Protocol
*** Data Access Protocol
SESIP-0722-KY
3
• Support DAP4
– CF option
• Support 8-bit and 64-bit integer mapping
– Default option
• Support NetCDF* data model(group etc. )
• Documentation
– A comprehensive user’s guide at github
• https://github.com/OPENDAP/hyrax_guide/blob/master/handl
ers/BES_Modules_The_HDF5_Handler.adoc
HDF5 OPeNDAP Handler Update
* Network Common Data Form
SESIP-0722-KY
4
• Output NetCDF file via the handler
– Sometimes it is very slow
HDF5 Handler Performance Study
HDF5
File
Hyrax
Core
HDF5 handler File netCDF NetCDF
File
SESIP-0722-KY
5
• Because HDF5 variables are compressed.
HDF5 Handler Performance Study
SESIP-0722-KY
6
HDF5 Handler Performance Study
• How compressed variables are processed
– HDF5 handler: Decompress via H5Dread
– File NetCDF: Compress via H5write
HDF5
File
HDF5 handler File NetCDF NetCDF
File
Decompress Compress
Hyrax
Core
SESIP-0722-KY
7
HDF5 Handler Performance Study
• Compression/decompression is costly
• Solution
– Passing through the compressed data
HDF5
File
HDF5 handler File NetCDF NetCDF
File
Decompress Compress
Hyrax
Core
Pass through the data Pass through the data
SESIP-0722-KY
8
HDF5 Handler Performance Study
HDF5
File
HDF5 handler File NetCDF NetCDF
File
Hyrax
Core
Pass through the data Pass through the data
• Is this possible?
• A proof-of-concept Study
SESIP-0722-KY
9
HDF5 Handler Performance Study
• A proof-of-concept study
– Use HDF5 direct chunk IO* API**s
• Packages that need to be updated
– HDF5 handler
• Read the passing-through compressed data
– DAP library
• Pass through the variable storage information
– NetCDF-4
• Write the passing-through compressed data
* Input Output
** Application Programming Interface
SESIP-0722-KY
10
HDF5 Handler Performance Study
• Testing Files Used
– GHRSST* and MERRA-2** data
• Repack the data to one chunk per variable
• Test Approach
– Only Hyrax Back-End Server(BES)
– besstandalone program on a Linux server
– Measure the wall clock time to output a
NetCDF-4 file
GHRSST: Group for High Resolution Sea Surface Temperature
MERRA: Modern-Era Retrospective analysis for Research and Applications
SESIP-0722-KY
11
HDF5 Handler Performance Study
• Testing Files
– GHRSST
• File size: 237 MB
• About 20 variables
• 5392x3200 8-bit or 16-bit integer
– MERRA-2
• File size: 489 MB
• About 50 variables
• 24x361x576 32-bit floating-point
SESIP-0722-KY
12
Performance Study Results
• Performance improved ~17 and ~30
times compared to the standard way
Wall Clock Time(Seconds) MERRA2 GHRSST
Standard Way
(Decompress and
compress the data)
55 26
Pass through the
compressed data
1.8 1.5
Speed up ~ 30 ~17
• Credit to the HDF5 library.
SESIP-0722-KY
13
This work was supported by NASA/GSFC under
Raytheon Technologies contract number
80GSFC21CA001.

More Related Content

PPTX
Efficiently serving HDF5 via OPeNDAP
PPTX
PPTX
HDF Update for DAAC Managers (2017-02-27)
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
Efficiently serving HDF5 via OPeNDAP
HDF Update for DAAC Managers (2017-02-27)
Access HDF Data in the Cloud via OPeNDAP Web Service

Similar to HDF5 OPeNDAP Handler Updates, and Performance Discussion (20)

PPTX
Moving form HDF4 to HDF5/netCDF-4
PPT
Using HDF5 Archive Information Package to preserve HDF-EOS2 data
PPSX
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
PPT
HDF OPeNDAP project update and demo
PPT
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
PPT
PPTX
Hyrax: Serving Data from S3
PPTX
HDF5 OPeNDAP project update and demo
PPTX
Bridging ICESat and ICESat-2 Standard Data Products
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
PPTX
HDF Group Support for NPP/NPOESS/JPSS
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PPT
Integrating HDF5 with SRB
PPSX
HDFEOS.org User Analsys, Updates, and Future
Moving form HDF4 to HDF5/netCDF-4
Using HDF5 Archive Information Package to preserve HDF-EOS2 data
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
HDF OPeNDAP project update and demo
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
Hyrax: Serving Data from S3
HDF5 OPeNDAP project update and demo
Bridging ICESat and ICESat-2 Standard Data Products
Creating Cloud-Optimized HDF5 Files
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
HDF Group Support for NPP/NPOESS/JPSS
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
Integrating HDF5 with SRB
HDFEOS.org User Analsys, Updates, and Future

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
HDF - Current status and Future Directions
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
PPTX
HDF for the Cloud - Serverless HDF
PPTX
HDF for the Cloud - New HDF Server Features
PPTX
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
PPTX
HDF5 and Ecosystem: What Is New?
PPTX
Leveraging the Cloud for HDF Software Testing
PPTX
Google Colaboratory for HDF-EOS
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
HDF - Current status and Future Directions
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10
HDF for the Cloud - Serverless HDF
HDF for the Cloud - New HDF Server Features
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
HDF5 and Ecosystem: What Is New?
Leveraging the Cloud for HDF Software Testing
Google Colaboratory for HDF-EOS

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
KodekX | Application Modernization Development
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Machine learning based COVID-19 study performance prediction
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Cloud computing and distributed systems.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Weekly Chronicles - August'25 Week I
Machine learning based COVID-19 study performance prediction
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Diabetes mellitus diagnosis method based random forest with bat algorithm
Cloud computing and distributed systems.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?

HDF5 OPeNDAP Handler Updates, and Performance Discussion

  • 1. SESIP-0722-KY HDF5 OPeNDAP Handler Updates, and Performance Discussion 2022 ESIP Summer Meeting This work was supported by NASA/GSFC under Raytheon Technologies contract number 80GSFC21CA001. This document does not contain technology or Technical Data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations. Kent Yang Software Engineer/NASA EED-3 contractor myang6@hdfgroup.org
  • 2. SESIP-0722-KY 2 • 2001: A prototype of HDF5 data handler – HDF5 to DAP***2: Default option • 2008: Handler in production – Climate and Forecast(CF) option: • Translate HDF5 metadata to follow CF • 2008-2018: Significant improvement – Still HDF5 to DAP2 HDF*5 OPeNDAP** Handler History * Hierarchical Data Format ** Open-source Project for a Network Data Access Protocol *** Data Access Protocol
  • 3. SESIP-0722-KY 3 • Support DAP4 – CF option • Support 8-bit and 64-bit integer mapping – Default option • Support NetCDF* data model(group etc. ) • Documentation – A comprehensive user’s guide at github • https://github.com/OPENDAP/hyrax_guide/blob/master/handl ers/BES_Modules_The_HDF5_Handler.adoc HDF5 OPeNDAP Handler Update * Network Common Data Form
  • 4. SESIP-0722-KY 4 • Output NetCDF file via the handler – Sometimes it is very slow HDF5 Handler Performance Study HDF5 File Hyrax Core HDF5 handler File netCDF NetCDF File
  • 5. SESIP-0722-KY 5 • Because HDF5 variables are compressed. HDF5 Handler Performance Study
  • 6. SESIP-0722-KY 6 HDF5 Handler Performance Study • How compressed variables are processed – HDF5 handler: Decompress via H5Dread – File NetCDF: Compress via H5write HDF5 File HDF5 handler File NetCDF NetCDF File Decompress Compress Hyrax Core
  • 7. SESIP-0722-KY 7 HDF5 Handler Performance Study • Compression/decompression is costly • Solution – Passing through the compressed data HDF5 File HDF5 handler File NetCDF NetCDF File Decompress Compress Hyrax Core Pass through the data Pass through the data
  • 8. SESIP-0722-KY 8 HDF5 Handler Performance Study HDF5 File HDF5 handler File NetCDF NetCDF File Hyrax Core Pass through the data Pass through the data • Is this possible? • A proof-of-concept Study
  • 9. SESIP-0722-KY 9 HDF5 Handler Performance Study • A proof-of-concept study – Use HDF5 direct chunk IO* API**s • Packages that need to be updated – HDF5 handler • Read the passing-through compressed data – DAP library • Pass through the variable storage information – NetCDF-4 • Write the passing-through compressed data * Input Output ** Application Programming Interface
  • 10. SESIP-0722-KY 10 HDF5 Handler Performance Study • Testing Files Used – GHRSST* and MERRA-2** data • Repack the data to one chunk per variable • Test Approach – Only Hyrax Back-End Server(BES) – besstandalone program on a Linux server – Measure the wall clock time to output a NetCDF-4 file GHRSST: Group for High Resolution Sea Surface Temperature MERRA: Modern-Era Retrospective analysis for Research and Applications
  • 11. SESIP-0722-KY 11 HDF5 Handler Performance Study • Testing Files – GHRSST • File size: 237 MB • About 20 variables • 5392x3200 8-bit or 16-bit integer – MERRA-2 • File size: 489 MB • About 50 variables • 24x361x576 32-bit floating-point
  • 12. SESIP-0722-KY 12 Performance Study Results • Performance improved ~17 and ~30 times compared to the standard way Wall Clock Time(Seconds) MERRA2 GHRSST Standard Way (Decompress and compress the data) 55 26 Pass through the compressed data 1.8 1.5 Speed up ~ 30 ~17 • Credit to the HDF5 library.
  • 13. SESIP-0722-KY 13 This work was supported by NASA/GSFC under Raytheon Technologies contract number 80GSFC21CA001.