SlideShare a Scribd company logo
The HDF Group

Improving long-term
preservation of EOS data by
independently mapping HDF4
data objects
Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent Yang
Ruth Duerr, Christopher Lynnes
The 14th HDF and HDF-EOS Workshop
September 28-30, 2010
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

1

www.hdfgroup.org
Mapping project team members

The HDF Group
•
•
•
•
•
•
•
•
•
•

Ruth Aydt
Peter Cao
Mike Folk
Joe Lee
Elena Pourmal
Tong Qi
Binh-Minh Ribler
Eunsoo Seo
Veer Singh
Muqun {Kent} Yang

September 28-30, 2010

NASA
• Ruth Duerr (NSIDC)
• Chris Lynnes (GESDISC)

HDF/HDF-EOS Workshop XIV

2

www.hdfgroup.org
HDF4 files are complex

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

3

www.hdfgroup.org
How do HDF users avoid
having to deal with all of that
complexity?

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

4

www.hdfgroup.org
Through the HDF software libraries,
either by using HDF APIs directly,

or by using HDF tools that depend
on the HDF libraries.
But what about the future…
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

5

www.hdfgroup.org
Over the long term, there is a
risk in depending solely on HDF
software to access HDFformatted data.
It is possible
in the distant future, that the
software may not be available.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

6

www.hdfgroup.org
“If only we could read HDF data with an
independent program that does not rely on
the HDF API…
A possible approach [would be to create] a
map of a data file, [and] utilities to
find, assemble and write out SDSes and
vdatas.”
“Leveraging HDF Utilities”
Christopher Lynnes
HDF Workshop X.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

7

www.hdfgroup.org
User’s view of the HDF4 SD model

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

8

www.hdfgroup.org
Mapping SDS to file offset/length

HDF4 file
layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

9

www.hdfgroup.org
Mapping with compressed chunks

HDF4 file
layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

10

www.hdfgroup.org
Recap
• Problem
• The complex byte layout of HDF files makes
long-term readability of HDF data dependent
on long-term availability of HDF software.

• Solution
• Create a map of the layout of data objects in
an HDF file, allowing a simple reader to be
written to access the data.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

11

www.hdfgroup.org
HDF4 mapping workflow

HDF4 File

hmap
linked with
HDF4 library

HDF4 Mapping File
(XML document)

Groups, Data Objects,
Structural and Application
Metadata;
Locations of Object Data

Object Data

Reader
program

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

12

www.hdfgroup.org
Target User
•
•
•
•

Person 20+ years in the future
Interested in data stored in HDF4 file
Has HDF4 file and companion map file
Can “write a program”

• May not have:
• HDF4 data model, format, documentation, or software
• Mapping schema, documentation, or software

• Will have knowledge of:
• Basic XML
• Data representations used today
• Compression used by HDF4 (JPEG, Szip, etc.)

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

13

www.hdfgroup.org
Project Phases
• Phase 1
• Categorize HDF4 data held by NASA.
• Build a prototype
• XML layout representation
• Tool to create XML map file for given HDF4 file
• Tools to read HDF4 data based solely on map
files

• Phase 2
• Build a robust version
• Deploy
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

14

www.hdfgroup.org
How many HDF4 products?
Data Center

HDF4 Products

ASF

0

GES-DISC
GHRC

54

ASDC

63

LP-DAAC

67

NSIDC

47

ORNL-DAAC

2

PO.DAAC

22

SDAC

0

MrDC

95

Total

September 28-30, 2010

236

586

HDF/HDF-EOS Workshop XIV

15

www.hdfgroup.org
Data characteristics
Product Characteristics Examined
• For SDS data
• Product Identification
• Number of SDSs
• Product Name
• Max number of dimensions
• Data Level
• Did any SDS have attributes
• Archive Location
• Was any SDS annotated

• For HDF-EOS
products

• HDF-EOS version
• For swath data
• Number of swaths
• Maximum number of
dimensions
• Organized by
time, space, both, or
other

• Etc.
September 28-30, 2010

• Were dimension scales
used
• Was compression used and
if so what kind
• Was chunking used

• For Vdata
• Number of Vdata structures
• Did any have attributes
• Did any fields have
attributes

• Etc.

HDF/HDF-EOS Workshop XIV

16

www.hdfgroup.org
Phase 2 tasks
A. Investigate integration of mapping schema
with existing standards
B. Determine HDF-EOS 2 requirements
C. Redesign and expand the XML schema
D. Implement production quality map writer
E. Develop demo map reader
F. Deploy tools at select NASA data centers

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

17

www.hdfgroup.org
The HDF Group

Task A
Investigate integration of
mapping schema with existing
standards

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

18

www.hdfgroup.org
Investigate existing standards
• Investigated:
• METS, PREMIS, ESML, NcML, and CSML

• Concluded:
• Existing standards have different purposes than
mapping schema
• None meet all needs of mapping project

• Develop new schema tailored to project goals
• Harmonize with PREMIS
• Leverage terminology and approaches from all
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

19

www.hdfgroup.org
The HDF Group

Task B
Determine HDF-EOS2
requirements

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

20

www.hdfgroup.org
Categorize HDF-EOS2 data products
• Created a data pool from NASA data centers
• GES DISC, NSIDC, LAADS, LP DAAC
• LaRC, PO.DAAC, GHRC, OBPG, LAADS

• Detailed description of sample data
• Reported options for adding HDF-EOS2
contents to the mapping file
• Documents and reports at wiki:
http://wiki.hdfgroup.org/MappingPhase2_TaskB

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

21

www.hdfgroup.org
The HDF Group

Task C
Redesign Schema

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

22

www.hdfgroup.org
Design priorities
• Mapping files
• Provide complete access to user-supplied
content in NASA’s EOS binary HDF4 files
• Have enough information to stand on their own
• Be as simple as possible

• Mapping schema
• Describe the Mapping files
• Used for validation and documentation
• May not be available to target user
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

23

www.hdfgroup.org
Representation of HDF4 Objects
HDF4 User-Level Object

Mapping File XML Element

Attribute, Annotation

Attribute

Vgroup

Group

Vdata

Table

SDS

Array

Dimension

Dimension

Raster Image

Not yet done

Palette

Not yet done

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

24

www.hdfgroup.org
Mapping File – Group & Table (fragment)

Select raw data
Information needed
Represents HDF4
values included to
to access and
Objects and
help user verify in
interpret raw data
Relationships
binary data handled
HDF4 file
properly

AMSR_E_L2_Land_V09_200501180027_D
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

25

www.hdfgroup.org
Status and Plans
• Status
• Map file design stabilizing for most HDF4
objects

• Plans
• Complete design for Raster Images and
Palettes
• Continue to refine instructions and contents
• Finalize schema

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

26

www.hdfgroup.org
The HDF Group

Task D
Implement Writer

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

27

www.hdfgroup.org
Map Writer Requirements
• Retrieve information needed from HDF4 file
• Write out corresponding XML file
• Quality requirements
• Completeness – don’t miss any objects in file.
• Accuracy – don’t give wrong information.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

28

www.hdfgroup.org
Writer Status and Plan
• Status
• Covers most Vgroup/Vdata/SDS objects.
• Covers some GR/Annotation objects.
• Being tested with NASA data.

• Plans:
• Increase coverage / accuracy / reliability.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

29

www.hdfgroup.org
The HDF Group

Task E
Implement demo reader

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

30

www.hdfgroup.org
Demo Reader Requirements
• Multiplatform command line tool
• Easy to use clear arguments and output
• Must validate that objects in the mapping file
are actually in the HDF4 file
• Developed in a well-supported high level
language (python)
• Well documented
• Available as open source

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

31

www.hdfgroup.org
Demo Reader Status
• Status
• Only Vdata support provided so far
• Current source code available at
https://sourceforge.net/projects/pyhdf
• Documentation at http://pyhdf.sourceforge.net/

• Plans
• SDS and RIS support

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

32

www.hdfgroup.org
The HDF Group

Task G
Deploy

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

33

www.hdfgroup.org
Deploy
• Begin in Jan 2011, complete in April
• Activities:
• GES DISC
• Incorporate into the existing archive ingest
system
• Manage the retrofit into existing metadata files

• NSIDC
• Support implementation in NSIDC’s ECS system

• Other ESDCs
• Encouraged to join in
• But deployment to other centers expected
subsequent to the project.
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

34

www.hdfgroup.org
The HDF Group

Thank You!

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

35

www.hdfgroup.org
Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

36

www.hdfgroup.org
The HDF Group

Questions/comments?

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

37

www.hdfgroup.org
September 28-30, 2010

HDF/HDF-EOS Workshop XIV

38

www.hdfgroup.org
Extra slides

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

39

www.hdfgroup.org

More Related Content

PPTX
Exploiting HDF5 Technologies to Represent Geo-Information-An Example with Com...
PDF
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
PPTX
Support for NPP/NPOESS/JPSS by The HDF Group
PPTX
HDF and netCDF Data Support in ArcGIS
PPT
Summary of HDF-EOS5 Files, Data Model and File Format
PPTX
PPT
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
PPTX
HDF-EOS Data Product Developer's Guide
Exploiting HDF5 Technologies to Represent Geo-Information-An Example with Com...
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Support for NPP/NPOESS/JPSS by The HDF Group
HDF and netCDF Data Support in ArcGIS
Summary of HDF-EOS5 Files, Data Model and File Format
Ensuring Long Term Access to Remotely Sensed HDF4 Data with Layout Maps
HDF-EOS Data Product Developer's Guide

What's hot (20)

PPT
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
PPTX
HDF Group Support for NPP/NPOESS/JPSS
PPTX
Bridging ICESat and ICESat-2 Standard Data Products
PPTX
Moving form HDF4 to HDF5/netCDF-4
PDF
Geoscience Data Analysis and Visualization Tools from NCAR
PPSX
Data Are from Mars, Tools Are from Venus
PPT
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
PPTX
Efficiently serving HDF5 via OPeNDAP
PDF
Using IDL with Suomi NPP VIIRS Data
PPTX
Advancing Scientific Data Support in ArcGIS
PPT
The New HDF-EOS WebSite - How it can help you
PPTX
Earth Science Platform
PPT
Hdf5 intro
PPT
PPT
Survey of Data Format Tools
PDF
Visualising Research Graph using Neo4j and Gephi
PPSX
Guided Tour of Pythonian Museum
PPTX
VRA 2014 VRA Core Unbound, Arnold
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
HDF Group Support for NPP/NPOESS/JPSS
Bridging ICESat and ICESat-2 Standard Data Products
Moving form HDF4 to HDF5/netCDF-4
Geoscience Data Analysis and Visualization Tools from NCAR
Data Are from Mars, Tools Are from Venus
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
Efficiently serving HDF5 via OPeNDAP
Using IDL with Suomi NPP VIIRS Data
Advancing Scientific Data Support in ArcGIS
The New HDF-EOS WebSite - How it can help you
Earth Science Platform
Hdf5 intro
Survey of Data Format Tools
Visualising Research Graph using Neo4j and Gephi
Guided Tour of Pythonian Museum
VRA 2014 VRA Core Unbound, Arnold
Ad

Viewers also liked (20)

PPTX
Earth Science Data and Information System (ESDIS) Project Update
PPTX
PPT
PPTX
HDF4 Mapping Project Update
PPTX
Web-based On-demand Global NDVI Data Services
PPT
PPTX
Easy Access of NASA HDF data via OPeNDAP
PPTX
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
PPT
HDF OPeNDAP project update and demo
PPT
Status of HDF-EOS, Related Software, and Tools
PPTX
HDF Tools Updates and Discussions
PPT
HDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFView
PPT
Status of HDF-EOS, Related Software and Tools
Earth Science Data and Information System (ESDIS) Project Update
HDF4 Mapping Project Update
Web-based On-demand Global NDVI Data Services
Easy Access of NASA HDF data via OPeNDAP
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
HDF OPeNDAP project update and demo
Status of HDF-EOS, Related Software, and Tools
HDF Tools Updates and Discussions
HDF-EOS to GeoTIFF Conversion Tool and HDF-EOS Plug-in for HDFView
Status of HDF-EOS, Related Software and Tools
Ad

Similar to Improving long-term preservation of EOS data by independently mapping HDF4 data objects (20)

PPTX
Easy Remote Access Via OPeNDAP
PPT
HDF Status and Development
PPT
HDF-EOS Workshop II Introduction
PPTX
Introduction to HDF5 Data and Programming Models
PPTX
HDF Project Status and Plans
PDF
HDF-EOS Subsetting: HEW and other tools
PDF
HDF-EOS Software Developer/Vendor Workshop Wrapup
PPT
Support for NPP/NPOESS by The HDF Group
PPTX
PPT
Migrating from HDF5 1.6 to 1.8

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Encapsulation theory and applications.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Improving long-term preservation of EOS data by independently mapping HDF4 data objects

  • 1. The HDF Group Improving long-term preservation of EOS data by independently mapping HDF4 data objects Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent Yang Ruth Duerr, Christopher Lynnes The 14th HDF and HDF-EOS Workshop September 28-30, 2010 September 28-30, 2010 HDF/HDF-EOS Workshop XIV 1 www.hdfgroup.org
  • 2. Mapping project team members The HDF Group • • • • • • • • • • Ruth Aydt Peter Cao Mike Folk Joe Lee Elena Pourmal Tong Qi Binh-Minh Ribler Eunsoo Seo Veer Singh Muqun {Kent} Yang September 28-30, 2010 NASA • Ruth Duerr (NSIDC) • Chris Lynnes (GESDISC) HDF/HDF-EOS Workshop XIV 2 www.hdfgroup.org
  • 3. HDF4 files are complex September 28-30, 2010 HDF/HDF-EOS Workshop XIV 3 www.hdfgroup.org
  • 4. How do HDF users avoid having to deal with all of that complexity? September 28-30, 2010 HDF/HDF-EOS Workshop XIV 4 www.hdfgroup.org
  • 5. Through the HDF software libraries, either by using HDF APIs directly, or by using HDF tools that depend on the HDF libraries. But what about the future… September 28-30, 2010 HDF/HDF-EOS Workshop XIV 5 www.hdfgroup.org
  • 6. Over the long term, there is a risk in depending solely on HDF software to access HDFformatted data. It is possible in the distant future, that the software may not be available. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 6 www.hdfgroup.org
  • 7. “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to create] a map of a data file, [and] utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities” Christopher Lynnes HDF Workshop X. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 7 www.hdfgroup.org
  • 8. User’s view of the HDF4 SD model September 28-30, 2010 HDF/HDF-EOS Workshop XIV 8 www.hdfgroup.org
  • 9. Mapping SDS to file offset/length HDF4 file layout September 28-30, 2010 HDF/HDF-EOS Workshop XIV 9 www.hdfgroup.org
  • 10. Mapping with compressed chunks HDF4 file layout September 28-30, 2010 HDF/HDF-EOS Workshop XIV 10 www.hdfgroup.org
  • 11. Recap • Problem • The complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software. • Solution • Create a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 11 www.hdfgroup.org
  • 12. HDF4 mapping workflow HDF4 File hmap linked with HDF4 library HDF4 Mapping File (XML document) Groups, Data Objects, Structural and Application Metadata; Locations of Object Data Object Data Reader program September 28-30, 2010 HDF/HDF-EOS Workshop XIV 12 www.hdfgroup.org
  • 13. Target User • • • • Person 20+ years in the future Interested in data stored in HDF4 file Has HDF4 file and companion map file Can “write a program” • May not have: • HDF4 data model, format, documentation, or software • Mapping schema, documentation, or software • Will have knowledge of: • Basic XML • Data representations used today • Compression used by HDF4 (JPEG, Szip, etc.) September 28-30, 2010 HDF/HDF-EOS Workshop XIV 13 www.hdfgroup.org
  • 14. Project Phases • Phase 1 • Categorize HDF4 data held by NASA. • Build a prototype • XML layout representation • Tool to create XML map file for given HDF4 file • Tools to read HDF4 data based solely on map files • Phase 2 • Build a robust version • Deploy September 28-30, 2010 HDF/HDF-EOS Workshop XIV 14 www.hdfgroup.org
  • 15. How many HDF4 products? Data Center HDF4 Products ASF 0 GES-DISC GHRC 54 ASDC 63 LP-DAAC 67 NSIDC 47 ORNL-DAAC 2 PO.DAAC 22 SDAC 0 MrDC 95 Total September 28-30, 2010 236 586 HDF/HDF-EOS Workshop XIV 15 www.hdfgroup.org
  • 16. Data characteristics Product Characteristics Examined • For SDS data • Product Identification • Number of SDSs • Product Name • Max number of dimensions • Data Level • Did any SDS have attributes • Archive Location • Was any SDS annotated • For HDF-EOS products • HDF-EOS version • For swath data • Number of swaths • Maximum number of dimensions • Organized by time, space, both, or other • Etc. September 28-30, 2010 • Were dimension scales used • Was compression used and if so what kind • Was chunking used • For Vdata • Number of Vdata structures • Did any have attributes • Did any fields have attributes • Etc. HDF/HDF-EOS Workshop XIV 16 www.hdfgroup.org
  • 17. Phase 2 tasks A. Investigate integration of mapping schema with existing standards B. Determine HDF-EOS 2 requirements C. Redesign and expand the XML schema D. Implement production quality map writer E. Develop demo map reader F. Deploy tools at select NASA data centers September 28-30, 2010 HDF/HDF-EOS Workshop XIV 17 www.hdfgroup.org
  • 18. The HDF Group Task A Investigate integration of mapping schema with existing standards September 28-30, 2010 HDF/HDF-EOS Workshop XIV 18 www.hdfgroup.org
  • 19. Investigate existing standards • Investigated: • METS, PREMIS, ESML, NcML, and CSML • Concluded: • Existing standards have different purposes than mapping schema • None meet all needs of mapping project • Develop new schema tailored to project goals • Harmonize with PREMIS • Leverage terminology and approaches from all September 28-30, 2010 HDF/HDF-EOS Workshop XIV 19 www.hdfgroup.org
  • 20. The HDF Group Task B Determine HDF-EOS2 requirements September 28-30, 2010 HDF/HDF-EOS Workshop XIV 20 www.hdfgroup.org
  • 21. Categorize HDF-EOS2 data products • Created a data pool from NASA data centers • GES DISC, NSIDC, LAADS, LP DAAC • LaRC, PO.DAAC, GHRC, OBPG, LAADS • Detailed description of sample data • Reported options for adding HDF-EOS2 contents to the mapping file • Documents and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB September 28-30, 2010 HDF/HDF-EOS Workshop XIV 21 www.hdfgroup.org
  • 22. The HDF Group Task C Redesign Schema September 28-30, 2010 HDF/HDF-EOS Workshop XIV 22 www.hdfgroup.org
  • 23. Design priorities • Mapping files • Provide complete access to user-supplied content in NASA’s EOS binary HDF4 files • Have enough information to stand on their own • Be as simple as possible • Mapping schema • Describe the Mapping files • Used for validation and documentation • May not be available to target user September 28-30, 2010 HDF/HDF-EOS Workshop XIV 23 www.hdfgroup.org
  • 24. Representation of HDF4 Objects HDF4 User-Level Object Mapping File XML Element Attribute, Annotation Attribute Vgroup Group Vdata Table SDS Array Dimension Dimension Raster Image Not yet done Palette Not yet done September 28-30, 2010 HDF/HDF-EOS Workshop XIV 24 www.hdfgroup.org
  • 25. Mapping File – Group & Table (fragment) Select raw data Information needed Represents HDF4 values included to to access and Objects and help user verify in interpret raw data Relationships binary data handled HDF4 file properly AMSR_E_L2_Land_V09_200501180027_D September 28-30, 2010 HDF/HDF-EOS Workshop XIV 25 www.hdfgroup.org
  • 26. Status and Plans • Status • Map file design stabilizing for most HDF4 objects • Plans • Complete design for Raster Images and Palettes • Continue to refine instructions and contents • Finalize schema September 28-30, 2010 HDF/HDF-EOS Workshop XIV 26 www.hdfgroup.org
  • 27. The HDF Group Task D Implement Writer September 28-30, 2010 HDF/HDF-EOS Workshop XIV 27 www.hdfgroup.org
  • 28. Map Writer Requirements • Retrieve information needed from HDF4 file • Write out corresponding XML file • Quality requirements • Completeness – don’t miss any objects in file. • Accuracy – don’t give wrong information. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 28 www.hdfgroup.org
  • 29. Writer Status and Plan • Status • Covers most Vgroup/Vdata/SDS objects. • Covers some GR/Annotation objects. • Being tested with NASA data. • Plans: • Increase coverage / accuracy / reliability. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 29 www.hdfgroup.org
  • 30. The HDF Group Task E Implement demo reader September 28-30, 2010 HDF/HDF-EOS Workshop XIV 30 www.hdfgroup.org
  • 31. Demo Reader Requirements • Multiplatform command line tool • Easy to use clear arguments and output • Must validate that objects in the mapping file are actually in the HDF4 file • Developed in a well-supported high level language (python) • Well documented • Available as open source September 28-30, 2010 HDF/HDF-EOS Workshop XIV 31 www.hdfgroup.org
  • 32. Demo Reader Status • Status • Only Vdata support provided so far • Current source code available at https://sourceforge.net/projects/pyhdf • Documentation at http://pyhdf.sourceforge.net/ • Plans • SDS and RIS support September 28-30, 2010 HDF/HDF-EOS Workshop XIV 32 www.hdfgroup.org
  • 33. The HDF Group Task G Deploy September 28-30, 2010 HDF/HDF-EOS Workshop XIV 33 www.hdfgroup.org
  • 34. Deploy • Begin in Jan 2011, complete in April • Activities: • GES DISC • Incorporate into the existing archive ingest system • Manage the retrofit into existing metadata files • NSIDC • Support implementation in NSIDC’s ECS system • Other ESDCs • Encouraged to join in • But deployment to other centers expected subsequent to the project. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 34 www.hdfgroup.org
  • 35. The HDF Group Thank You! September 28-30, 2010 HDF/HDF-EOS Workshop XIV 35 www.hdfgroup.org
  • 36. Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 36 www.hdfgroup.org
  • 37. The HDF Group Questions/comments? September 28-30, 2010 HDF/HDF-EOS Workshop XIV 37 www.hdfgroup.org
  • 38. September 28-30, 2010 HDF/HDF-EOS Workshop XIV 38 www.hdfgroup.org
  • 39. Extra slides September 28-30, 2010 HDF/HDF-EOS Workshop XIV 39 www.hdfgroup.org

Editor's Notes

  • #6: Full quote, from proposal:Through the HDF software libraries, either by using the HDF APIs directly or by using HDF tools that depend on the HDF libraries. However there is a risk in depending solely on the HDF libraries to access HDF-formatted data over the long term. It is possible, especially in the distant future, that the libraries may not be as readily available as they are today. To address this risk, it is desirable to have a way to retrieve the data independently.At the 10th HDF workshop, Christopher Lynnes of the Goddard Earth Sciences Data and Information Services Center(GES DISC) addressed this need: “If only we could read HDF data with an independent program that does not rely on the HDF API… A possible approach [would be to] extend” hdfls to print a hierarchical map of a data file, [and] write ncdump/hdp-like utilities to find, assemble and write out SDSes and vdatas.” “Leveraging HDF Utilities,” Christopher Lynnes, 10th HDF Workshop. http://www.hdfeos.org/workshops/ws10/presentations/day3/Leveraging_HDF_Utilities.ppt.
  • #14: TheHDF4 Mapping Schema describes an XML Document that provides access to content originally stored in a binary HDF4 file.The HDF4 Mapping Schema is defined by one or more XML schema documents written in the XML Schema Definition Language, XSDL.An HDF4 Mapping File is an XML Document that conforms to the HDF4 Mapping Schema.Data representations used today: twos-complement, IEEE floating point, big/little endian
  • #20: METS = Metadata Encoding and Transmission Standard; a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital libraryPREMIS = PREservation Metadata: Implementation Standard; The PREMIS Data Dictionary defines a core set of semantic units that repositories should know in order to perform their preservation functions. Format-specific metadata is excluded as out of scope.ESML = Earth Science Markup LanguageNcML = NetCDF Markup Language [Schema used with Common Data Model (CDM) datasets]CSML = Climate Science Modelling Language
  • #26: AMSR_E_L2_Land_V09_200501180027_D
  • #41: AIRS.2002.08.31.L3.RetStd_H001.v5.0.14.0.G07178195754
  • #42: Test file created for project