SlideShare a Scribd company logo
HDF5 2.0: Cloud Optimized from the Start
2025 ESIP Summer Meeting
Aleksandar Jelenak
2
Acknowledgments
• Features presented here are the result of open-source community contributors
and HDFG staff supported by a few projects.
• Project honorable mentions:
• NASA/GSFC Raytheon Company contract number 80GSFC21CA001
• Department of Energy, Office of Science, Office of Fusion Energy Sciences, Award Number DE-
SC0024442
3
HDF5 Library 2.0
• Semantic versioning of releases
• CMake build system only
• Complex numbers and AI/ML datatypes
• Default settings changed to benefit cloud optimized HDF5
• New S3 backend for Read-Only S3 (ROS3) driver
• HDF5 filter availability improvements
4
Compliant semantic versioning of library releases
• MAJOR.MINOR.PATCH
• In all current releases the MAJOR number was always 1, effectively meaningless.
• Major version:
• HDF5 file format change.
• Any change that introduces application binary interface (ABI) incompatibility.
• Minor version:
• Any change not deemed worthy of a major release, e.g., new API, improved performance, etc.
• Patch version:
• Security or memory leak fixes.
• Don’t worry, just upgrade.
• Minor or patch releases aimed to be drop-in upgrades.
5
CMake build system only
• Building the library using Autotools removed.
• Supporting both CMake and Autotools was a big drain on HDFG resources.
• Build configurations are available as CMake presets with the library source code.
• CMake presets are in a JSON document and define configurations for building, testing, and
packaging.
• You can personalize build settings with user CMake presets in a separate JSON file
reusing/incorporating already available ones.
• Blog about building HDF5 library 2.0-dev and h5py in a conda environment:
https://www.hdfgroup.org/2025/07/22/how-to-build-hdf5-library-and-h5py-in-a-conda-virtual-environment-update/
6
Complex number datatype
• The C99 standard float _Complex, double _Complex and long double
_Complex datatypes for platforms and compilers that support them.
• New HDF5 datatypes: H5T_NATIVE_FLOAT_COMPLEX,
H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX,
H5T_COMPLEX_IEEE_F32, H5T_COMPLEX_IEEE_F64.
• Grandfathered in current complex number data based on specific HDF5 compound
datatypes:
H5T_COMPOUND {
<float_type> "r(e)(a)(l)"; OFFSET 0
<float_type> "i(m)(a)(g)(i)(n)(a)(r)(y)"; OFFSET SIZEOF("r(e)(a)(l)")
}
7
AI/ML datatypes
• New predefined floating-point datatypes defined in Open Compute Project
Microscaling (MX) Specification v1.0.
• H5T_FLOAT_F8E4M3, H5T_FLOAT_F8E5M2, H5T_FLOAT_F6E2M3,
H5T_FLOAT_F6E3M2, H5T_FLOAT_F4E2M1
• Google Brain 16-bit float (bfloat16): H5T_FLOAT_BFLOAT16
• Compiler support patchy so the library provides soft conversion for now, which is
slower.
8
Changed default settings for cloud optimized HDF5
• Dataset chunk cache size increase to 8 MiB from 1 MiB.
• File page cache size set to 64 MiB when ROS3 driver is used. Before: zero (no page
caching).
9
New ROS3 backend
• ROS3 driver now uses AWS-C-S3 library to communicate with the AWS S3
service.
• Built-in support for AWS config/credential files and AWS environment variables.
• Handling of non-fatal failed S3 requests.
• Smart network utilization.
• Support for S3 URI (h5ls s3://mybucket/myfile.h5)
• Features already available in recent library releases:
• First 16 MiB of the file cached during its open operation.
• Non-zero file page cache will not cause library error when opening typical HDF5 files.
• Bugfix: Making an additional S3 request for the cached first 16 MiB of the file.
10
Easier ROS3 logging
• HDF5_ROS3_VFD_DEBUG for S3 requests and related info to stderr. Any value
except false, off, or 0 enables debug logging.
• HDF5_ROS3_VFD_LOG_LEVEL for AWS-C-S3 library log info to a file with values:
error, info, debug, trace.
• The above two can be used together because of different logging information.
• HDF5_ROS3_VFD_LOG_FILE for AWS-C-S3 log file destination. Default file name:
hdf5_ros3_vfd.log.
11
HDF5_ROS3_VFD_DEBUG example
$ HDF5_ROS3_VFD_DEBUG=on h5dump -H –p s3://hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 > /dev/null
-- parsed URL as:
- Scheme: s3
- Host: s3.us-west-2.amazonaws.com
- Path: /data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5
- Query:
- Bucket: hdf5.sample
- Key: data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5
-- HEAD: Bucket: hdf5.sample / Key: data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5
-- request headers:
Host: s3.us-west-2.amazonaws.com
User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3
-- response status: 200
-- response headers:
x-amz-id-2: gryevECCbexGVtuAmdSbTeyhEmEgEnSMryErkAcb5OYLba0DlATZEQgUiyf9niQ8YldfW55XOkA=
x-amz-request-id: RT436M8D0SX2ZVGD
Date: Tue, 24 Jun 2025 14:12:00 GMT
Last-Modified: Sun, 11 Feb 2024 18:32:48 GMT
ETag: "9b1d7670f5bafb409cf729964abb477b-89"
x-amz-server-side-encryption: AES256
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 1518338048
Server: AmazonS3
-- final HTTP status code: 200
-- file size: 1518338048 bytes
-- GET: Bytes 0 - 16777215, Request Size: 16777216
-- request headers:
Host: s3.us-west-2.amazonaws.com
User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3
Range: bytes=0-16777215
-- final HTTP status code: 206
-- GET: Bytes 1048576000 - 1056964607, Request Size: 8388608
-- request headers:
Host: s3.us-west-2.amazonaws.com
User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3
Range: bytes=1048576000-1056964607
-- final HTTP status code: 206
12
HDF5_ROS3_VFD_LOG_LEVEL example
$ HDF5_ROS3_VFD_LOG_LEVEL=info h5dump -H -p s3://hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 > /dev/null
$ tail -n +26 hdf5_ros3_vfd.log | head -n 30
[INFO] [2025-06-24T23:59:11Z] [000000016fd0b000] [event-loop] - id=0x14265ac30: main loop started
[INFO] [2025-06-24T23:59:11Z] [000000016fd0b000] [event-loop] - id=0x14265ac30: default timeout 100s, and max events to process per tick 100
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [event-loop] - id=0x1426599b0: starting event-loop thread.
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [standard-retry-strategy] - static: creating new standard retry strategy
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [standard-retry-strategy] - id=0x14265b610: creating backing exponential backoff strategy with max_retries of 5
[INFO] [2025-06-24T23:59:11Z] [000000016fe17000] [event-loop] - id=0x1426599b0: main loop started
[INFO] [2025-06-24T23:59:11Z] [000000016fe17000] [event-loop] - id=0x1426599b0: default timeout 100s, and max events to process per tick 100
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [exp-backoff-strategy] - id=0x142659df0: Initializing exponential backoff retry strategy with scale factor: 0 jitter
mode: 0 and max retries 5
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [S3Client] - id=0x142659f20 Initiating making of meta request
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [connection-manager] - id=0x14265d3e0: Successfully created
[INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [S3Client] - id=0x142659f20: Created meta request 0x14265c7c0
[INFO] [2025-06-24T23:59:11Z] [000000016f5b7000] [S3ClientStats] - id=0x142659f20 Requests-in-flight(approx/exact):1/1 Requests-preparing:1 Requests-queued:0
Requests-network(get/put/default/total):0/0/0/0 Requests-streaming-waiting:0 Requests-streaming-response:0 Endpoints(in-table/allocated):1/1
[INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthCredentialsProvider] - (id=0x1426594c0) Default chain credentials provider successfully sourced credentials
[INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthSigning] - (id=0x14270b3d0) Signing successfully built canonical request for algorithm SigV4, with contents
HEAD
/hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5
host:s3.us-west-2.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20250624T235911Z
host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
[INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthSigning] - (id=0x14270b3d0) Signing successfully built string-to-sign via algorithm SigV4, with contents
AWS4-HMAC-SHA256
20250624T235911Z
20250624/us-west-2/s3/aws4_request
a7cba9b6346d36a06596caad68be12da7e2424309bb11ed604df13a90cbdf2eb
13
Recap: HDF5 filters
• Any software that transforms a block of bytes representing one HDF5 dataset
chunk can become an HDF5 filter if a filter plugin for it is written.
• Each filter plugin have a unique identifier assigned by the HDF Group.
• Anyone can create a filter plugin. Popular ones (subjective!) are included in the
HDF Group’s GitHub repo (hdf5_plugins) for HDF5 library continuous integration
activities and binary release packages.
• Official information about registered filter plugins:
https://github.com/HDFGroup/hdf5_plugins/blob/master/docs/RegisteredFilterPlugins.md
• Most filters are for data compression.
14
HDF5 filter plugin news
• Official registered filter plugins document is being constantly updated to improve
data interoperability.
• Documented custom LZ4-compressed HDF5 dataset chunk format.
• Detailed specification of filter plugin configuration parameters.
• LZ4 filter plugin bugfix if using LZ4 >= v1.9.
• Zlib-ng library can be used for the DEFLATE compression filter.
• It is about 2x faster than the zlib library used traditionally by the HDF5 library.
• This is a build option.
15
How to make advanced compression filters easily
available?
• Except DEFLATE and SZIP, all other compression filters are dynamically loaded
plugins which must be available at runtime.
• Builds of HDF5 library and HDFG’s hdf5_plugins repo of filter plugins are now
compatible so no configuration required to find the plugins during runtime.
• Encourage community package repository maintainers to provide “integrated”
builds of both the library and the most popular filter plugins.
• The hdf5_plugins repo could become a community resource and centralize
maintenance of the popular plugins. Current offerings: LZ4, LZF, BZIP2, BLOSC,
BLOSC2, Zstandard, Bitshuffle, BitGroom, Granular BitRound, JPEG, and ZFP.
Thank you!
ajelenak@hdfgroup.org

More Related Content

PPTX
HDF for the Cloud - New HDF Server Features
PPTX
PPTX
HDF5 and Ecosystem: What Is New?
PDF
HDFCloud Workshop: HDF5 in the Cloud
PPTX
Parallel Computing with HDF Server
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
HDF for the Cloud - New HDF Server Features
HDF5 and Ecosystem: What Is New?
HDFCloud Workshop: HDF5 in the Cloud
Parallel Computing with HDF Server
The State of HDF5 / Dana Robinson / The HDF Group

Similar to HDF5 2.0: Cloud Optimized from the Start (20)

PPTX
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Accessing HDF5 data in the cloud with HSDS
PDF
HDF - Current status and Future Directions
PPTX
PDF
Creating Cloud-Optimized HDF5 Files
PPT
HDF5 Backward and Forward Compatibility Issues
PPT
Introduction to HDF5 Data Model, Programming Model and Library APIs
PDF
Cloud-Optimized HDF5 Files
Highly Scalable Data Service (HSDS) Performance Features
Accessing HDF5 data in the cloud with HSDS
HDF - Current status and Future Directions
Creating Cloud-Optimized HDF5 Files
HDF5 Backward and Forward Compatibility Issues
Introduction to HDF5 Data Model, Programming Model and Library APIs
Cloud-Optimized HDF5 Files
Ad

More from The HDF-EOS Tools and Information Center (19)

PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
PPTX
HDF for the Cloud - Serverless HDF
PPSX
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
PPTX
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
PPTX
Leveraging the Cloud for HDF Software Testing
PPTX
Google Colaboratory for HDF-EOS
PPTX
HDF-EOS Data Product Developer's Guide
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10
HDF for the Cloud - Serverless HDF
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
Leveraging the Cloud for HDF Software Testing
Google Colaboratory for HDF-EOS
HDF-EOS Data Product Developer's Guide
Ad

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Modernizing your data center with Dell and AMD
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
NewMind AI Monthly Chronicles - July 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
Modernizing your data center with Dell and AMD
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
Building Integrated photovoltaic BIPV_UPV.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Diabetes mellitus diagnosis method based random forest with bat algorithm
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf

HDF5 2.0: Cloud Optimized from the Start

  • 1. HDF5 2.0: Cloud Optimized from the Start 2025 ESIP Summer Meeting Aleksandar Jelenak
  • 2. 2 Acknowledgments • Features presented here are the result of open-source community contributors and HDFG staff supported by a few projects. • Project honorable mentions: • NASA/GSFC Raytheon Company contract number 80GSFC21CA001 • Department of Energy, Office of Science, Office of Fusion Energy Sciences, Award Number DE- SC0024442
  • 3. 3 HDF5 Library 2.0 • Semantic versioning of releases • CMake build system only • Complex numbers and AI/ML datatypes • Default settings changed to benefit cloud optimized HDF5 • New S3 backend for Read-Only S3 (ROS3) driver • HDF5 filter availability improvements
  • 4. 4 Compliant semantic versioning of library releases • MAJOR.MINOR.PATCH • In all current releases the MAJOR number was always 1, effectively meaningless. • Major version: • HDF5 file format change. • Any change that introduces application binary interface (ABI) incompatibility. • Minor version: • Any change not deemed worthy of a major release, e.g., new API, improved performance, etc. • Patch version: • Security or memory leak fixes. • Don’t worry, just upgrade. • Minor or patch releases aimed to be drop-in upgrades.
  • 5. 5 CMake build system only • Building the library using Autotools removed. • Supporting both CMake and Autotools was a big drain on HDFG resources. • Build configurations are available as CMake presets with the library source code. • CMake presets are in a JSON document and define configurations for building, testing, and packaging. • You can personalize build settings with user CMake presets in a separate JSON file reusing/incorporating already available ones. • Blog about building HDF5 library 2.0-dev and h5py in a conda environment: https://www.hdfgroup.org/2025/07/22/how-to-build-hdf5-library-and-h5py-in-a-conda-virtual-environment-update/
  • 6. 6 Complex number datatype • The C99 standard float _Complex, double _Complex and long double _Complex datatypes for platforms and compilers that support them. • New HDF5 datatypes: H5T_NATIVE_FLOAT_COMPLEX, H5T_NATIVE_DOUBLE_COMPLEX, H5T_NATIVE_LDOUBLE_COMPLEX, H5T_COMPLEX_IEEE_F32, H5T_COMPLEX_IEEE_F64. • Grandfathered in current complex number data based on specific HDF5 compound datatypes: H5T_COMPOUND { <float_type> "r(e)(a)(l)"; OFFSET 0 <float_type> "i(m)(a)(g)(i)(n)(a)(r)(y)"; OFFSET SIZEOF("r(e)(a)(l)") }
  • 7. 7 AI/ML datatypes • New predefined floating-point datatypes defined in Open Compute Project Microscaling (MX) Specification v1.0. • H5T_FLOAT_F8E4M3, H5T_FLOAT_F8E5M2, H5T_FLOAT_F6E2M3, H5T_FLOAT_F6E3M2, H5T_FLOAT_F4E2M1 • Google Brain 16-bit float (bfloat16): H5T_FLOAT_BFLOAT16 • Compiler support patchy so the library provides soft conversion for now, which is slower.
  • 8. 8 Changed default settings for cloud optimized HDF5 • Dataset chunk cache size increase to 8 MiB from 1 MiB. • File page cache size set to 64 MiB when ROS3 driver is used. Before: zero (no page caching).
  • 9. 9 New ROS3 backend • ROS3 driver now uses AWS-C-S3 library to communicate with the AWS S3 service. • Built-in support for AWS config/credential files and AWS environment variables. • Handling of non-fatal failed S3 requests. • Smart network utilization. • Support for S3 URI (h5ls s3://mybucket/myfile.h5) • Features already available in recent library releases: • First 16 MiB of the file cached during its open operation. • Non-zero file page cache will not cause library error when opening typical HDF5 files. • Bugfix: Making an additional S3 request for the cached first 16 MiB of the file.
  • 10. 10 Easier ROS3 logging • HDF5_ROS3_VFD_DEBUG for S3 requests and related info to stderr. Any value except false, off, or 0 enables debug logging. • HDF5_ROS3_VFD_LOG_LEVEL for AWS-C-S3 library log info to a file with values: error, info, debug, trace. • The above two can be used together because of different logging information. • HDF5_ROS3_VFD_LOG_FILE for AWS-C-S3 log file destination. Default file name: hdf5_ros3_vfd.log.
  • 11. 11 HDF5_ROS3_VFD_DEBUG example $ HDF5_ROS3_VFD_DEBUG=on h5dump -H –p s3://hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 > /dev/null -- parsed URL as: - Scheme: s3 - Host: s3.us-west-2.amazonaws.com - Path: /data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 - Query: - Bucket: hdf5.sample - Key: data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 -- HEAD: Bucket: hdf5.sample / Key: data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 -- request headers: Host: s3.us-west-2.amazonaws.com User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3 -- response status: 200 -- response headers: x-amz-id-2: gryevECCbexGVtuAmdSbTeyhEmEgEnSMryErkAcb5OYLba0DlATZEQgUiyf9niQ8YldfW55XOkA= x-amz-request-id: RT436M8D0SX2ZVGD Date: Tue, 24 Jun 2025 14:12:00 GMT Last-Modified: Sun, 11 Feb 2024 18:32:48 GMT ETag: "9b1d7670f5bafb409cf729964abb477b-89" x-amz-server-side-encryption: AES256 Accept-Ranges: bytes Content-Type: binary/octet-stream Content-Length: 1518338048 Server: AmazonS3 -- final HTTP status code: 200 -- file size: 1518338048 bytes -- GET: Bytes 0 - 16777215, Request Size: 16777216 -- request headers: Host: s3.us-west-2.amazonaws.com User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3 Range: bytes=0-16777215 -- final HTTP status code: 206 -- GET: Bytes 1048576000 - 1056964607, Request Size: 8388608 -- request headers: Host: s3.us-west-2.amazonaws.com User-Agent: libhdf5/2.0.0 (vfd:ros3) libaws-c-s3 Range: bytes=1048576000-1056964607 -- final HTTP status code: 206
  • 12. 12 HDF5_ROS3_VFD_LOG_LEVEL example $ HDF5_ROS3_VFD_LOG_LEVEL=info h5dump -H -p s3://hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 > /dev/null $ tail -n +26 hdf5_ros3_vfd.log | head -n 30 [INFO] [2025-06-24T23:59:11Z] [000000016fd0b000] [event-loop] - id=0x14265ac30: main loop started [INFO] [2025-06-24T23:59:11Z] [000000016fd0b000] [event-loop] - id=0x14265ac30: default timeout 100s, and max events to process per tick 100 [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [event-loop] - id=0x1426599b0: starting event-loop thread. [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [standard-retry-strategy] - static: creating new standard retry strategy [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [standard-retry-strategy] - id=0x14265b610: creating backing exponential backoff strategy with max_retries of 5 [INFO] [2025-06-24T23:59:11Z] [000000016fe17000] [event-loop] - id=0x1426599b0: main loop started [INFO] [2025-06-24T23:59:11Z] [000000016fe17000] [event-loop] - id=0x1426599b0: default timeout 100s, and max events to process per tick 100 [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [exp-backoff-strategy] - id=0x142659df0: Initializing exponential backoff retry strategy with scale factor: 0 jitter mode: 0 and max retries 5 [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [S3Client] - id=0x142659f20 Initiating making of meta request [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [connection-manager] - id=0x14265d3e0: Successfully created [INFO] [2025-06-24T23:59:11Z] [00000001f2639f00] [S3Client] - id=0x142659f20: Created meta request 0x14265c7c0 [INFO] [2025-06-24T23:59:11Z] [000000016f5b7000] [S3ClientStats] - id=0x142659f20 Requests-in-flight(approx/exact):1/1 Requests-preparing:1 Requests-queued:0 Requests-network(get/put/default/total):0/0/0/0 Requests-streaming-waiting:0 Requests-streaming-response:0 Endpoints(in-table/allocated):1/1 [INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthCredentialsProvider] - (id=0x1426594c0) Default chain credentials provider successfully sourced credentials [INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthSigning] - (id=0x14270b3d0) Signing successfully built canonical request for algorithm SigV4, with contents HEAD /hdf5.sample/data/cohdf5/GEDI/PAGE08MiB_GEDI02_A_2023034194553_O23479_02_T00431_02_003_02_V002.h5 host:s3.us-west-2.amazonaws.com x-amz-content-sha256:UNSIGNED-PAYLOAD x-amz-date:20250624T235911Z host;x-amz-content-sha256;x-amz-date UNSIGNED-PAYLOAD [INFO] [2025-06-24T23:59:11Z] [000000016fbff000] [AuthSigning] - (id=0x14270b3d0) Signing successfully built string-to-sign via algorithm SigV4, with contents AWS4-HMAC-SHA256 20250624T235911Z 20250624/us-west-2/s3/aws4_request a7cba9b6346d36a06596caad68be12da7e2424309bb11ed604df13a90cbdf2eb
  • 13. 13 Recap: HDF5 filters • Any software that transforms a block of bytes representing one HDF5 dataset chunk can become an HDF5 filter if a filter plugin for it is written. • Each filter plugin have a unique identifier assigned by the HDF Group. • Anyone can create a filter plugin. Popular ones (subjective!) are included in the HDF Group’s GitHub repo (hdf5_plugins) for HDF5 library continuous integration activities and binary release packages. • Official information about registered filter plugins: https://github.com/HDFGroup/hdf5_plugins/blob/master/docs/RegisteredFilterPlugins.md • Most filters are for data compression.
  • 14. 14 HDF5 filter plugin news • Official registered filter plugins document is being constantly updated to improve data interoperability. • Documented custom LZ4-compressed HDF5 dataset chunk format. • Detailed specification of filter plugin configuration parameters. • LZ4 filter plugin bugfix if using LZ4 >= v1.9. • Zlib-ng library can be used for the DEFLATE compression filter. • It is about 2x faster than the zlib library used traditionally by the HDF5 library. • This is a build option.
  • 15. 15 How to make advanced compression filters easily available? • Except DEFLATE and SZIP, all other compression filters are dynamically loaded plugins which must be available at runtime. • Builds of HDF5 library and HDFG’s hdf5_plugins repo of filter plugins are now compatible so no configuration required to find the plugins during runtime. • Encourage community package repository maintainers to provide “integrated” builds of both the library and the most popular filter plugins. • The hdf5_plugins repo could become a community resource and centralize maintenance of the popular plugins. Current offerings: LZ4, LZF, BZIP2, BLOSC, BLOSC2, Zstandard, Bitshuffle, BitGroom, Granular BitRound, JPEG, and ZFP.