SlideShare a Scribd company logo
Proposal for adding Named Dimensions
to HDF5 Arrays

Daniel Kahn
Science Systems and Applications, Inc.
HDF/HDF-EOS Meeting Oct 15-17th, Aurora CO

HDF/HDF-EOS Meeting Oct 15-
Motivation:
The Ozone Mapping and Profiler Suite (OMPS) data
produced by the O3 PEATE will be using HDF5, but not
HDF-EOS5. Features of HDF-EOS5 needed by the
PEATE are easily represented in HDF5, with the
exception of named array dimensions.
This talk describes a proposed method to add this
functionality to HDF5 and also shows how named
dimensions could be used to describe other types of
useful data structures.

HDF/HDF-EOS Meeting Oct 15-
Prior Art: HDF-EOS5 Swaths
An HDF-EOS5 Swath consists of a collection of arrays,
generally as a result of a satellite remote sensing
measurements (i.e., an orbit's worth of data).

Rectangular array of ozone
data projected on Earth.
HDF/HDF-EOS Meeting Oct 15-
The HDF-EOS5 Swath consists of a
1) list of named dimension (Name-Value pairs) metadata.
2) set of arrays whose axes are defined by the named
dimensions.
HDF-EOS5 achieves this via a simple API, kept simple by
restricting the user from defining HDF5 group hierarchies.
File Root

Dim
Metadata

Swath 1
Dataset1

Dataset2

Swath 2
etc.

HDF-EOS5 Swath Cartoon Diagram
HDF/HDF-EOS Meeting Oct 15-

Dim
Metadata
Goal: Offer a corresponding facility under HDF5
1) without sacrificing the ability to define arbitrary hierarchies and
2) using publicly defined attributes associated directly with the
HDF5 objects they help to describe.

HDF/HDF-EOS Meeting Oct 15-
Named Dimensions vs Numerical Dimensions
HDF5 uses numerical dimensions:
hid_t H5Screate_simple(int rank, const hsize_t * dims,       
                                                   const hsize_t * maxdims)
dims is an array of numbers.
A named dimension is a Name-Value pair used to define
arrays. The equivalent statement using named dimensions
would look like:
H5ext_create_simple(rank, char **DimNames)
HDF/HDF-EOS Meeting Oct 15-
A challenge in defining named extents in HDF5 is that
we need some idea of the “scope” of the names
within the HDF5 Group hierarchy.

G
G

G
Attributes

Should these groups
see the Extent
Names in G*?

G

G*

NameValue1
NameValue2
etc.

Presumably not, but some kind of inheritance would be nice.
HDF/HDF-EOS Meeting Oct 15-
We can define inheritance in which datasets in G* could
use Named Dims from G* and also from extents inherited
from G* parent, G+, and so on up the hierarchy.

G
Attributes

G

G+

NameValue3
NameValue4
etc.

Attributes

G

HDF/HDF-EOS Meeting Oct 15-

G*

NameValue1
NameValue2
etc.
What would the code look like?
Group = H5Gcreate(“/”,”G1”)‫‏‬
H5ext_CreateNamedExtent(Group,“Name1”, Value)‫‏‬
etc.
char **Names = {“Name1”,”Name2”,”Name3”}
ret= H5ext_CreateDataSet(Group, rank, Names,
ArrayPointer)‫‏‬
This searches up the hierarchy for names.
if(ret == Error)‫ ‏‬then Dimension Name not found
in hierarchy.
Normal HDF5 writing routines.
(close, close, close, etc.)‫‏‬

HDF/HDF-EOS Meeting Oct 15-

(next, side benefits)
Additional Benefits:
Dimension Scale-like capability built in…

Index Map contains indexes of DimName2
HDF/HDF-EOS Meeting Oct 15-

DimName1

DimName2

…and the Named Dimension
also open the possibility to have
scales for incommensurate
mappings.
Data
Scale Index Map
DimName1

Scale
DimName1

DimName1

Data
The Named dimension approach also appears to be
isomorphic to compound datatypes.
Three value compound datatype
Field 1 Field 2 Field 3

One array of
compound type

DimName1

Measurement Group

Three ordinary arrays
of basic type

They seem equivalent but....
HDF/HDF-EOS Meeting Oct 15-

Field 2

Field 3

DimName1

DimName1

Rows

Field 1

DimName1

Rows
With the Named Dimension approach fields can be added or
deleted trivially and the field elements themselves can easily
have other dimensions.
Field 4

+

DimName1

Field 3

DimName1

Field 2

DimName1

DimName1

Rows

Field 1

Di

Na
m

The target language can read/write the data without any
dependence on the C language implementation of
HDF/HDF-EOS Meeting Oct 15structures.

e2
m
Implementation Thoughts:
Most likely implementation would be to publicly define
HDF5 attributes to represent the metadata and write C
interface to simplify programing.

Potential Pitfalls:
Order of Dimension Names is language dependent.

Need bi-directional parent-child relationships.

others?


Ideas or comments are welcome and needed, in
particular on any gaps or problems with the proposed
design.
HDF/HDF-EOS Meeting Oct 15-
Conclusion:
The proposed method, called “Named
Dimensions”, is expected to provide equivalent
data representation to the named dimensions of
HDF-EOS5 in vanilla HDF5.
In addition, other benefits may be realized,
including full use of the HDF5 group hierarchy, a
flexible Dimension Scale-like capability and an
alternative to representing data in compound data
types without depending on C style structures.
The Named Dimensions facility needs to be
implemented on top of HDF5 with no changes to
the base library. (i.e., No PEATE branch of HDF5)‫‏‬
using publicly defined attributes.
Acknowledgment: This work was carried out under NASA contract NNG06HX18C, task 614.5-01-06

HDF/HDF-EOS Meeting Oct 15-
More Detailed Implementation Thoughts:
Group Attributes
Group

DimName_001:
DimName_002:
DimName_003:
DimExtent_001:
DimExtent_002:
DimExtent_003:

“Length”
“Height”
“Width”
40
100
16

DataSet Attributes
DimNames: {“Width”,”Height”,”Length”}

DataSet

40

100
16

HDF/HDF-EOS Meeting Oct 15-

More Related Content

PPT
Substituting HDF5 tools with Python/H5py scripts
PPT
The Python Programming Language and HDF5: H5Py
PPTX
EST 102 Programming in C-MODULE 5
PPT
The MATLAB Low-Level HDF5 Interface
DOCX
8086 addressing modes
PPTX
Priamry data type
PPTX
Adding CF Attributes to an HDF5 File
Substituting HDF5 tools with Python/H5py scripts
The Python Programming Language and HDF5: H5Py
EST 102 Programming in C-MODULE 5
The MATLAB Low-Level HDF5 Interface
8086 addressing modes
Priamry data type
Adding CF Attributes to an HDF5 File

What's hot (16)

PPT
多媒體資料庫(New)3rd
DOCX
Notes 8086 instruction format
PPT
PPT
Uniformed tree searching
PPTX
Addressing Modes Of 8086
PPT
8086 add mod
PDF
Lgm pakdd2011 public
PDF
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
PPTX
Instruction sets of 8086
PPTX
Data Mining Seminar - Graph Mining and Social Network Analysis
DOC
Filelist
PPT
Lecture 06
PPTX
Lecture 17 Iterative Deepening a star algorithm
PPT
Inside database
PDF
Datomic rtree-pres
多媒體資料庫(New)3rd
Notes 8086 instruction format
Uniformed tree searching
Addressing Modes Of 8086
8086 add mod
Lgm pakdd2011 public
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
Instruction sets of 8086
Data Mining Seminar - Graph Mining and Social Network Analysis
Filelist
Lecture 06
Lecture 17 Iterative Deepening a star algorithm
Inside database
Datomic rtree-pres
Ad

Viewers also liked (20)

PPT
Profile of HDF-EOS5 Files
PPT
The CFD General Notation System transition to HDF5
PPT
HDFView and HDF Java Products
PPT
Profile of NPOESS HDF5 Files
PPTX
Support for NPP/NPOESS by The HDF Group
PDF
HDF and HDF-EOS Experiences and Applications
PPT
Shifting the Burden from the User to the Data Provider
PDF
Workshop Discussion: HDF & HDF-EOS Future Direction
PPTX
HDF5 OPeNDAP project update and demo
PPT
Migrating from HDF5 1.6 to 1.8
PPT
What will be new in HDF5?
PPT
Status of HDF-EOS, Related Software, and Tools
PPT
Reading HDF family of formats via NetCDF-Java / CDM
PPT
ORNL DAAC MODIS Land Product Subsets
PPT
Using HDF5 Archive Information Package to preserve HDF-EOS2 data
PPTX
Hdf5 current future
Profile of HDF-EOS5 Files
The CFD General Notation System transition to HDF5
HDFView and HDF Java Products
Profile of NPOESS HDF5 Files
Support for NPP/NPOESS by The HDF Group
HDF and HDF-EOS Experiences and Applications
Shifting the Burden from the User to the Data Provider
Workshop Discussion: HDF & HDF-EOS Future Direction
HDF5 OPeNDAP project update and demo
Migrating from HDF5 1.6 to 1.8
What will be new in HDF5?
Status of HDF-EOS, Related Software, and Tools
Reading HDF family of formats via NetCDF-Java / CDM
ORNL DAAC MODIS Land Product Subsets
Using HDF5 Archive Information Package to preserve HDF-EOS2 data
Hdf5 current future
Ad

Similar to Proposal for adding Named Dimensions to HDF5 Arrays (20)

PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
PDF
HDF-EOS Data Extractor & Metadata Updater
PPT
Hdf5 intro
PPTX
PPTX
Introduction to HDF5 Data and Programming Models
PPT
Dimension Scales in HDF-EOS2 and HDF-EOS5
PDF
Implementing HDF5 in MATLAB
PPTX
Tools to improve the usability of NASA HDF Data
PPT
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPT
Using HDF5 and Python: The H5py module
PPT
PPT
PPT
HDF5 Advanced Topics - Datatypes and Partial I/O
PPSX
NASA HDF/HDF-EOS Data for Dummies (and Developers)
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
HDF-EOS Data Extractor & Metadata Updater
Hdf5 intro
Introduction to HDF5 Data and Programming Models
Dimension Scales in HDF-EOS2 and HDF-EOS5
Implementing HDF5 in MATLAB
Tools to improve the usability of NASA HDF Data
Introduction to HDF5 Data Model, Programming Model and Library APIs
Using HDF5 and Python: The H5py module
HDF5 Advanced Topics - Datatypes and Partial I/O
NASA HDF/HDF-EOS Data for Dummies (and Developers)

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Machine Learning_overview_presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mushroom cultivation and it's methods.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Machine Learning_overview_presentation.pptx
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mushroom cultivation and it's methods.pdf
Tartificialntelligence_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
MIND Revenue Release Quarter 2 2025 Press Release
Diabetes mellitus diagnosis method based random forest with bat algorithm
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
TLE Review Electricity (Electricity).pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Machine learning based COVID-19 study performance prediction
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation_ Review paper, used for researhc scholars
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Digital-Transformation-Roadmap-for-Companies.pptx

Proposal for adding Named Dimensions to HDF5 Arrays

  • 1. Proposal for adding Named Dimensions to HDF5 Arrays Daniel Kahn Science Systems and Applications, Inc. HDF/HDF-EOS Meeting Oct 15-17th, Aurora CO HDF/HDF-EOS Meeting Oct 15-
  • 2. Motivation: The Ozone Mapping and Profiler Suite (OMPS) data produced by the O3 PEATE will be using HDF5, but not HDF-EOS5. Features of HDF-EOS5 needed by the PEATE are easily represented in HDF5, with the exception of named array dimensions. This talk describes a proposed method to add this functionality to HDF5 and also shows how named dimensions could be used to describe other types of useful data structures. HDF/HDF-EOS Meeting Oct 15-
  • 3. Prior Art: HDF-EOS5 Swaths An HDF-EOS5 Swath consists of a collection of arrays, generally as a result of a satellite remote sensing measurements (i.e., an orbit's worth of data). Rectangular array of ozone data projected on Earth. HDF/HDF-EOS Meeting Oct 15-
  • 4. The HDF-EOS5 Swath consists of a 1) list of named dimension (Name-Value pairs) metadata. 2) set of arrays whose axes are defined by the named dimensions. HDF-EOS5 achieves this via a simple API, kept simple by restricting the user from defining HDF5 group hierarchies. File Root Dim Metadata Swath 1 Dataset1 Dataset2 Swath 2 etc. HDF-EOS5 Swath Cartoon Diagram HDF/HDF-EOS Meeting Oct 15- Dim Metadata
  • 5. Goal: Offer a corresponding facility under HDF5 1) without sacrificing the ability to define arbitrary hierarchies and 2) using publicly defined attributes associated directly with the HDF5 objects they help to describe. HDF/HDF-EOS Meeting Oct 15-
  • 6. Named Dimensions vs Numerical Dimensions HDF5 uses numerical dimensions: hid_t H5Screate_simple(int rank, const hsize_t * dims,                                                           const hsize_t * maxdims) dims is an array of numbers. A named dimension is a Name-Value pair used to define arrays. The equivalent statement using named dimensions would look like: H5ext_create_simple(rank, char **DimNames) HDF/HDF-EOS Meeting Oct 15-
  • 7. A challenge in defining named extents in HDF5 is that we need some idea of the “scope” of the names within the HDF5 Group hierarchy. G G G Attributes Should these groups see the Extent Names in G*? G G* NameValue1 NameValue2 etc. Presumably not, but some kind of inheritance would be nice. HDF/HDF-EOS Meeting Oct 15-
  • 8. We can define inheritance in which datasets in G* could use Named Dims from G* and also from extents inherited from G* parent, G+, and so on up the hierarchy. G Attributes G G+ NameValue3 NameValue4 etc. Attributes G HDF/HDF-EOS Meeting Oct 15- G* NameValue1 NameValue2 etc.
  • 9. What would the code look like? Group = H5Gcreate(“/”,”G1”)‫‏‬ H5ext_CreateNamedExtent(Group,“Name1”, Value)‫‏‬ etc. char **Names = {“Name1”,”Name2”,”Name3”} ret= H5ext_CreateDataSet(Group, rank, Names, ArrayPointer)‫‏‬ This searches up the hierarchy for names. if(ret == Error)‫ ‏‬then Dimension Name not found in hierarchy. Normal HDF5 writing routines. (close, close, close, etc.)‫‏‬ HDF/HDF-EOS Meeting Oct 15- (next, side benefits)
  • 10. Additional Benefits: Dimension Scale-like capability built in… Index Map contains indexes of DimName2 HDF/HDF-EOS Meeting Oct 15- DimName1 DimName2 …and the Named Dimension also open the possibility to have scales for incommensurate mappings. Data Scale Index Map DimName1 Scale DimName1 DimName1 Data
  • 11. The Named dimension approach also appears to be isomorphic to compound datatypes. Three value compound datatype Field 1 Field 2 Field 3 One array of compound type DimName1 Measurement Group Three ordinary arrays of basic type They seem equivalent but.... HDF/HDF-EOS Meeting Oct 15- Field 2 Field 3 DimName1 DimName1 Rows Field 1 DimName1 Rows
  • 12. With the Named Dimension approach fields can be added or deleted trivially and the field elements themselves can easily have other dimensions. Field 4 + DimName1 Field 3 DimName1 Field 2 DimName1 DimName1 Rows Field 1 Di Na m The target language can read/write the data without any dependence on the C language implementation of HDF/HDF-EOS Meeting Oct 15structures. e2 m
  • 13. Implementation Thoughts: Most likely implementation would be to publicly define HDF5 attributes to represent the metadata and write C interface to simplify programing. Potential Pitfalls: Order of Dimension Names is language dependent.  Need bi-directional parent-child relationships.  others?  Ideas or comments are welcome and needed, in particular on any gaps or problems with the proposed design. HDF/HDF-EOS Meeting Oct 15-
  • 14. Conclusion: The proposed method, called “Named Dimensions”, is expected to provide equivalent data representation to the named dimensions of HDF-EOS5 in vanilla HDF5. In addition, other benefits may be realized, including full use of the HDF5 group hierarchy, a flexible Dimension Scale-like capability and an alternative to representing data in compound data types without depending on C style structures. The Named Dimensions facility needs to be implemented on top of HDF5 with no changes to the base library. (i.e., No PEATE branch of HDF5)‫‏‬ using publicly defined attributes. Acknowledgment: This work was carried out under NASA contract NNG06HX18C, task 614.5-01-06 HDF/HDF-EOS Meeting Oct 15-
  • 15. More Detailed Implementation Thoughts: Group Attributes Group DimName_001: DimName_002: DimName_003: DimExtent_001: DimExtent_002: DimExtent_003: “Length” “Height” “Width” 40 100 16 DataSet Attributes DimNames: {“Width”,”Height”,”Length”} DataSet 40 100 16 HDF/HDF-EOS Meeting Oct 15-