SlideShare a Scribd company logo
HDF5 Advanced Topics

Elena Pourmal
The HDF Group
The 15th HDF and HDF-EOS Workshop
April 17, 2012
April 17-19

HDF/HDF-EOS Workshop XV

1
Goal
• To learn about HDF5 features important for
writing portable and efficient applications using
H5Py

April 17-19

HDF/HDF-EOS Workshop XV

2
Outline
• Groups and Links
• Types of groups and links
• Discovering objects in an HDF5 file

• Datasets
• Datatypes
• Partial I/O
• Other features
• Extensibility
• Compression

April 17-19

HDF/HDF-EOS Workshop XV

3
GROUPS AND LINKS

April 17-19

HDF/HDF-EOS Workshop XV

4
Groups and Links
• Groups are containers for links (graph edges)
• Links were added in 1.8.0
• Warning: Many APIs in H5G interface are
obsolete - use H5L interfaces to discover and
manipulate file structure

April 17-19

HDF/HDF-EOS Workshop XV

5
Groups and Links
HDF5 groups
and links
organize
data objects.

/

Experiment Notes:
Serial Number: 99378920
Date: 3/13/09
Configuration: Standard 3

Every HDF5 file
has a root group

SimOut

Viz
lat | lon | temp
----|-----|----12 | 23 | 3.1
15 | 24 | 4.2
17 | 21 | 3.6

Timestep
36,000

April 17-19, 2012

HDF/HDF-EOS Workshop XV

6

Parameters
10;100;1000
Example h5_links.py
Different kinds of
links

/

links.h5

A

B
dangling
a

soft

a

External

Dataset can be “reached”
using three paths
/A/a
/a
/soft
April 17-19, 2012

HDF/HDF-EOS Workshop XV

dset.h5

Dataset is in a different file
7
Example h5_links.py
Different kinds of
links

/

links.h5

A

B

dangling
a

soft

Hard links “A” and “B” were created when groups were created
Hard link “a” was added to the root group and points to an existing dataset
Soft link “soft” points to the existing dataset (cmp. UNIX alias)
Soft link “dangling” doesn’t point to any object
April 17-19, 2012

HDF/HDF-EOS Workshop XV

8
Links
• Name
• Example: “A”, “B”, “a”, “dangling”, “soft”
• Unique within a group; “/” are not allowed in names

• Type
• Hard Link
• Value is object’s address in a file
• Created automatically when object is created
• Can be added to point to existing object

• Soft Link
• Value is a string , for example, “/A/a”, but can be
anything
• Use to create aliases
April 17-19

HDF/HDF-EOS Workshop XV

9
Links (cont.)
• Type
• External Link
• Value is a pair of strings , for example, (“dset.h5”,
“dset” )
• Use to access data in other HDF5 files
• Example: For NPP data products geo-location information
may be in a separate file

April 17-19

HDF/HDF-EOS Workshop XV

10
Links Properties
• Links Properties
• ASCII or UTF-8 encoding for names
• Create intermediate groups
• Saves programming effort

• C example
lcpl_id = H5Pcreate(H5P_LINK_CREATE);
H5Gcreate (fid, "A/B", lcpl_id, H5P_DEFAULT, H5P_DEFAULT);

• Group “A” will be created if it doesn’t exist

April 17-19

HDF/HDF-EOS Workshop XV

11
Operations on Links
•
•
•
•
•
•

See H5L interface in Reference Manual
Create
Delete
Copy
Iterate
Check if exists

April 17-19

HDF/HDF-EOS Workshop XV

12
Operations on Links
• APIs available for C and Fortran
• Use dictionary operations in Python
• Objects associated with links ARE NOT affected
• Deleting a link removes a path to the object
• Copying a link doesn’t copy an object

April 17-19

HDF/HDF-EOS Workshop XV

13
Example h5_links.py
Link a in A is removed

/

links.h5

A

B
dangling
a

soft
External

Dataset can be “reached”
using one paths
/a

dset.h5

Dataset is in a different file
April 17-19, 2012

HDF/HDF-EOS Workshop XV

14
Example h5_links.py
Link a in root is
removed

/

links.h5

A

B
dangling
soft
External
dset.h5

Dataset is unreachable
Dataset is in a different file
April 17-19, 2012

HDF/HDF-EOS Workshop XV

15
Groups Properties
• Creation properties
• Type of links storage
• Compact (in 1.8.* versions)
• Used with a few members (default under 8)

• Dense (default behavior)
• Used with many (>16) members (default)

• Tunable size for a local heap
• Save space by providing estimate for size of the storage
required for links names

• Can be compressed (in 1.8.5 and later)
• Many links with similar names (XXX-abc, XXX-d, XXXefgh, etc.)
• Requires more time to compress/uncompress data
April 17-19

HDF/HDF-EOS Workshop XV

16
Groups Properties
• Creation properties
• Links may have creation order tracked and indexed
• Indexing by name (default)
• A, B, a, dangling, soft

• Indexing by creation order (has to be enabled)
• A, B, a, soft, dangling

• http://www.hdfgroup.org/ftp/HDF5/examples/exam
ples-by-api/api18-c.html

April 17-19

HDF/HDF-EOS Workshop XV

17
Discovering HDF5 file’s structure
• HDF5 provides C and Fortran 2003 APIs for
recursive and non-recursive iterations over the
groups and attributes
• H5Ovisit and H5Literate (H5Giterate)
• H5Aiterate

• Life is much easier with H5Py (h5_visita.py)
import h5py
def print_info(name, obj):
print name
for name, value in obj.attrs.iteritems():
print name+":", value
f = h5py.File('GATMO-SATMS-npp.h5', 'r+')
f.visititems(print_info)
f.close()
April 17-19

HDF/HDF-EOS Workshop XV

18
Checking a path in HDF5
• HDF5 1.8.8 provides HL C and Fortran 2003 APIs
for checking if paths exists
• H5LTvalid_path (h5ltvalid_path_f)
• Example: Is there an object with a path /A/B/C/d ?
• TRUE if there is a path, FALSE otherwise

April 17-19

HDF/HDF-EOS Workshop XV

19
Hints
• Use latest file format (see
H5Pset_libver_bound function in RM)
• Save space when creating a lot of groups in
a file
• Save time when accessing many objects
(>1000)
• Caution: Tools built with the HDF5 versions prirt
to 1.8.0 will not work on the files created with this
property

April 17-19

HDF/HDF-EOS Workshop XV

20
DATASETS

April 17-19

HDF/HDF-EOS Workshop XV

21
HDF5 Datatypes

April 17-19

HDF/HDF-EOS Workshop XV

22
HDF5 Datatypes
• Integer and floating point
• String
• Compound
• Similar to C structures or Fortran Derived Types

•
•
•
•
•

Array
References
Variable-length
Enum
Opaque

April 17-19

HDF/HDF-EOS Workshop XV

23
HDF5 Datatypes
• Datatype descriptions
• Are stored in the HDF5 file with the data
• Include encoding (e.g., byte order, size, and
floating point representation) and other
information to assure portability across

platforms
• See C, Fortran, MATLAB and Java
examples under
http://www.hdfgroup.org/ftp/HDF5/examples/

April 17-19

HDF/HDF-EOS Workshop XV

24
Data Portability in HDF5
Array of integers on Intel platform Array of long integers on SPARC64 platform
long is big-endian, 8 bytes
int is little-endian, 4 bytes

int

long

H5Dwrite

H5Dread

H5T_STD_I32LE

April 17-19

HDF/HDF-EOS Workshop XV

25
Data Portability in HDF5 (cont.)
We use native integer type to describe data in a
file
dset =
H5Dcreate(file,NAME,H5T_NATIVE_INT,…
Description of data in a buffer
H5Dwrite(dset,H5T_NATIVE_INT,…,buf);

H5Dread(dset,H5T_NATIVE_LONG,…, buf);
Description of data in a buffer; library will perform
Conversion from 4 byte LE to 8 byte BE integer

April 17-19

HDF/HDF-EOS Workshop XV

26
Hints
• Avoid datatype conversion if possible
• Store necessary precision to save space in
a file
• Starting with HDF5 1.8.7, Fortran APIs
support different kinds of integers and floats
(if Fortran 2003 feature is enabled)

April 17-19

HDF/HDF-EOS Workshop XV

27
HDF5 Strings

April 17-19

HDF/HDF-EOS Workshop XV

28
HDF5 Strings
• Fixed length
• Data elements has to have the same size
• Short strings will use more byte than needed
• Application responsible for providing buffers of the
correct size on read

• Variable length
• Data elements may not have the same size
• Writing/reading strings is “easy”; library handles
memory allocations

April 17-19

HDF/HDF-EOS Workshop XV

29
HDF5 Strings – Fixed-length
• Example h5_string.py(c,f90)

fixed_string = np.dtype('a10')
dataset = file.create_dataset("DSfixed",(4,), dtype=fixed_string)
data = ("Parting", ".is such", ".sweet", ".sorrow...")
dataset[...] = data

• Stores fours strings “Parting", ” .is such", ” .sweet",
”.sorrow…” in a dataset.
• Strings have length 10
• Python uses NULL padded strings (default)

April 17-19

HDF/HDF-EOS Workshop XV

30
HDF5 Strings
• Example h5_vlstring.py(c,f90)
str_type = h5py.new_vlen(str)
dataset = file.create_dataset("DSvariable",(4,), dtype=str_type)
data = ("Parting", " is such", " sweet", " sorrow...")
dataset[...] = data

• Stores fours strings “Parting", ” is such", ” sweet",
”sorrow…” in a dataset.
• Strings have length 7, 8, 6, 10

April 17-19

HDF/HDF-EOS Workshop XV

31
Hints
• Fixed length strings
• Can be compressed
• Use when need to store a lot of strings

• Variable-length strings
• Compression cannot be applied to data
• Use for attributes and a few strings if space is a
concern

April 17-19

HDF/HDF-EOS Workshop XV

32
HDF5 Compound Datatypes

April 17-19

HDF/HDF-EOS Workshop XV

33
HDF5 Compound Datatypes
• Compound types
• Comparable to C structures or Fortran 90
Derived Types
• Members can be of any datatype
• Data elements can written/read by a single field
or a set of fields

April 17-19

HDF/HDF-EOS Workshop XV

34
Creating and Writing Compound Dataset
• Example h5_compound.py(c,f90)
• Stores four records in the dataset
Orbit
integer

Location
string

Temperature (F)
64-bit float

Pressure (inHg)
64-bit-float

1153

Sun

53.23

24.57

1184

Moon

55.12

22.95

1027

Venus

103.55

31.33

1313

Mars

1252.89

84.11

April 17-19

HDF/HDF-EOS Workshop XV

35
Creating and Writing Compound Dataset
comp_type = np.dtype([('Orbit’,'i'),('Location’,np.str_, 6),
….)
dataset = file.create_dataset("DSC",(4,), comp_type)
dataset[...] = data

Note for C and Fortran2003 users:
• You’ll need to construct memory and file datatypes
• Use HOFFSET macro instead of calculating offset by hand.
• Order of H5Tinsert calls is not important if HOFFSET is used.

April 17-19

HDF/HDF-EOS Workshop XV

36
Reading Compound Dataset
f = h5py.File('compound.h5', 'r')
dataset = f ["DSC"]
….
orbit = dataset['Orbit']
print "Orbit: ", orbit
data = dataset[...]
print data
….
print dataset[2, 'Location']

April 17-19

HDF/HDF-EOS Workshop XV

37
Fortran 2003
• HDF5 Fortran library 1.8.8 with Fortran 2003
enabled has the same capabilities for writing
derived types as C library
• H5OFFSET function
• No need to write/read by fields as before

April 17-19

HDF/HDF-EOS Workshop XV

38
Hints
• When to use compound datatypes?
• Application needs access to the whole record

• When not to use compound datatypes?
• Application needs access to specific fields often
• Store the field in a dataset

/

/
DSC

Pressure

Orbit

Location
Temperature
April 17-19

HDF/HDF-EOS Workshop XV

39
HDF5 Reference Datatypes

April 17-19

HDF/HDF-EOS Workshop XV

40
References to Objects and Dataset Regions

/

Test Data

Viz
References to HDF5
Objects

References to dataset regions

.

Group
Image 2…..
Image 3…..

April 17-19, 2012

HDF/HDF-EOS Workshop XV

41

.
Reference Datatypes
• Object Reference
• Unique identifier of an object in a file
• HDF5 predefined datatype
H5T_STD_REG_OBJ
• Dataset Region Reference
• Unique identifier to a dataset + dataspace
selection
• HDF5 predefined datatype
H5T_STD_REF_DSETREG

April 17-19

HDF/HDF-EOS Workshop XV

42
Conceptual view of HDF5 NPP file
XML User’s Block

Product Group

Root - /

Agg
Reference
Object

Data
Gran n

Reference
Region

Reference
Region

43
NPP HDF5 file in HDFView

April 17-19

HDF/HDF-EOS Workshop XV

44
HDF5 Object References
• h5_objref.py (c,f90)
• Creates a dataset with object references
1.
2.
3.
4.

group = f.create_group("G1")
Scalar dataspace
dataset = f.create_dataset("DS2",(), 'i')
# Create object references to a group and a dataset
refs = (group.ref, dataset.ref)

5. ref_type = h5py.h5t.special_dtype(ref=h5py.Reference)
6. dataset_ref = file.create_dataset("DS1", (2,),ref_type)
7. dataset_ref[...] = refs

April 17-19

HDF/HDF-EOS Workshop XV

45
HDF5 Object References (cont.)
• h5_objref.py (c,f90)
• Finding the object a reference points to:
1.
2.
3.
4.
5.
6.

f = h5py.File('objref.h5','r')
dataset_ref = f["DS1"]
print h5py.h5t.check_dtype(ref=dataset_ref.dtype)
refs = dataset_ref[...]
refs_list = list(refs)
for obj in refs_list:
print

April 17-19

f[obj]

HDF/HDF-EOS Workshop XV

46
HDF5 Dataset Region References
• h5_regref.py (c,f90)
• Creates a dataset with region references to each
row in a dataset
1.
2.
3.
4.

refs = (dataset.regionref[0,:],…,dataset.regionref[2,:])
ref_type = h5py.h5t.special_dtype(ref=h5py.RegionReference)
dataset_ref = file.create_dataset("DS1", (3,),ref_type)
dataset_ref[...] = refs

April 17-19

HDF/HDF-EOS Workshop XV

47
HDF5 Dataset Region References (cont.)
• h5_regref.py (c,f90)
• Finding a dataset and a data region pointed by a
region reference
1.
2.
3.
4.
5.
6.

path_name = f[regref].name
print path_name
# Open the dataset using the pathname we just found
data = file[path_name]
# Region reference can be used as a slicing argument!
print data[regref]

April 17-19

HDF/HDF-EOS Workshop XV

48
Hints
• When to use HDF5 object references?
• Instead of an attribute with a lot of data
• Create an attribute of the object reference type and
point to a dataset with the data

• In a dataset to point to related objects in HDF5 file

• When to use HDF5 region references?
• In datasets and attributes to point to a region of
interest
• When accessing the same region many times to
avoid hyperslab selection process

April 17-19

HDF/HDF-EOS Workshop XV

49
Partial I/O

Working with subsets

April 17-19

HDF/HDF-EOS Workshop XV

50
Collect data one way ….
Array of images (3D)

April 17-19

HDF/HDF-EOS Workshop XV

51
Display data another way …

Stitched image (2D array)

April 17-19

HDF/HDF-EOS Workshop XV

52
Data is too big to read….

April 17-19

HDF/HDF-EOS Workshop XV

53
How to Describe a Subset in HDF5?
• Before writing and reading a subset of data
one has to describe it to the HDF5 Library.
• HDF5 APIs and documentation refer to a
subset as a “selection” or “hyperslab
selection”.
• If specified, HDF5 Library will perform I/O on a
selection only and not on all elements of a
dataset.

April 17-19

HDF/HDF-EOS Workshop XV

54
Types of Selections in HDF5
• Two types of selections
• Hyperslab selection
• Regular hyperslab
• Simple hyperslab
• Result of set operations on hyperslabs
(union, difference, …)

• Point selection

• Hyperslab selection is especially important for
doing parallel I/O in HDF5 (See Parallel HDF5
Tutorial)

April 17-19

HDF/HDF-EOS Workshop XV

55
Regular Hyperslab

Collection of regularly spaced equal size blocks

April 17-19

HDF/HDF-EOS Workshop XV

56
Simple Hyperslab

Contiguous subset or sub-array

April 17-19

HDF/HDF-EOS Workshop XV

57
Hyperslab Selection

Result of union operation on three simple hyperslabs

April 17-19

HDF/HDF-EOS Workshop XV

58
Hyperslab Description
• Start - starting location of a hyperslab (1,1)
• Stride - number of elements that separate each
block (3,2)
• Count - number of blocks (2,6)
• Block - block size (2,1)
• Everything is “measured” in number of elements

April 17-19

HDF/HDF-EOS Workshop XV

59
Simple Hyperslab Description
• Two ways to describe a simple hyperslab
• As several blocks
• Stride – (1,1)
• Count – (3,4)
• Block – (1,1)

• As one block
• Stride – (1,1)
• Count – (1,1)
• Block – (3,4)

No performance penalty for
one way or another
April 17-19

HDF/HDF-EOS Workshop XV

60
Writing and Reading a Hyperslab
• Example h5_hype.py(c, f90)
• Creates 8x10 integer dataset and populates with data; writes
a simple hyperslab (3x4) starting at offset (1,2)
• H5Py uses NumPy indexing to specify a hyperslab
• Numpy indexing array[i : j : k]
• i – the starting index; j – the stopping index; k – is the step (≠ 0)

dataset[1:4, 2:6]

offset

April 17-19

count+offset

HDF/HDF-EOS Workshop XV

61
Writing and Reading Simple Hyperslab
dataset[1:4, 2:6] = 5
print "Data after selection is written:"
print dataset[...]
[[1
[1
[1
[1
[1
[1
[1
[1

April 17-19

1
1
1
1
1
1
1
1

1
5
5
5
1
1
1
1

1
5
5
5
1
1
1
1

1
5
5
5
1
1
1
1

2
5
5
5
2
2
2
2

2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2

2]
2]
2]
2]
2]
2]
2]
2]]

HDF/HDF-EOS Workshop XV

62
Writing and Reading Regular Hyperslab
space_id = dataset.id.get_space()
space_id.select_hyperslab((1,1), (2,2), stride=(4,4), block=(
2,2))
dataset.id.read(space_id, space_id, data_selected)
print data_selected
Selected data read from file....
[[0
[0
[0
[0
[0
[0
[0
[0
April 17-19

0
1
1
0
0
1
1
0

0
5
5
0
0
1
1
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0
5
5
0
0
2
2
0

0
2
2
0
0
2
2
0

0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0

0]
0]
0]
0]
0]
0]
0]
0]]

HDF/HDF-EOS Workshop XV

63
Writing and Reading Point Selection
• Example h5_selecelem.py(c, f90)
• Creates 2 integer datasets and populates with data; writes a
point selection at locations (0,1) and (0, 3)
• H5Py uses NumPy indexing to specify points in array
val = (55,59)
dataset2[0, [1,3]] = val
[[ 1 55
[ 1 1
[ 1 1

April 17-19

1 59]
1 1]
1 1]]

HDF/HDF-EOS Workshop XV

64
Hints
• C and Fortran
• Applications’ memory grows with the number of
open handles.
• Don’t keep dataspace handles open if
unnecessary, e.g., when reading hyperslab in a
loop.
• Make sure that selection in a file has the same
number of elements as selection in memory when
doing partial I/O.

April 17-19

HDF/HDF-EOS Workshop XV

65
Other Features
Storage, Extendibility, Compression

April 17-19

HDF/HDF-EOS Workshop XV

66
Dataset Storage Options
• Compact
• Used for storing small (a few Ks) data

• Contiguous (default)
• Used for accessing contiguous subsets of data

• Chunked
• Data is store in chunks of predefined size
• Used when:
• Appending data
• Compressing data
• Accessing non-contiguous data (e.g., columns)

April 17-19

HDF/HDF-EOS Workshop XV

67
HDF5 Dataset

Metadata

Dataset data

Dataspace
Rank Dimensions
3

Dim_1 = 4
Dim_2 = 5
Dim_3 = 7

Datatype
IEEE 32-bit float

Attributes
Storage info

Time = 32.4

Chunked

Pressure = 987

Compressed

Temp = 56

April 17-19

HDF/HDF-EOS Workshop XV

68
Examples of Data Storage
Compact
Metadata

Raw data

Contiguous

April 17-19

HDF/HDF-EOS Workshop XV

Chunked

69
Extending HDF5 dataset
• Example h5_unlim.py(c,f90)
• Creates a dataset and appends rows and columns
• Dataset has to be chunked
• Chunk sizes do not need to be factors of the dimension sizes
dataset = f.create_dataset('DS1',(4,7),'i',chunks=(3,3),
maxshape=(None, None))
0
0
0
0
0
0
April 17-19

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

0
0
0
0
0
0

HDF/HDF-EOS Workshop XV

0
0
0
0
0
0

0
0
0
0
0
0
70
Extending HDF5 dataset
• Example h5_unlim.py(c,f90)
dataset.resize((6,7))
dataset[4:6] = 1
dataset.resize((6,10))
dataset[:,7:10] = 2
0
0
0
0
1
1

April 17-19

0
0
0
0
1
1

0
0
0
0
1
1

0
0
0
0
1
1

0
0
0
0
1
1

0
0
0
0
1
1

0
0
0
0
1
1

2
2
2
2
2
2

HDF/HDF-EOS Workshop XV

2
2
2
2
2
2

2
2
2
2
2
2

71
HDF5 compression
•
•
•

Chunking is required for compression and
other filters
HDF5 filters modify data during I/O operations
Compression filters in HDF5
•
•
•
•

April 17-19

Scale + offset (H5Pset_scaleoffset)
N-bit (H5Pset_nbit)
GZIP (deflate) (H5Pset_deflate)
SZIP (H5Pset_szip)

HDF/HDF-EOS Workshop XV

72
HDF5 Third-Party Filters
• Compression methods supported by HDF5
User’s community
http://www.hdfgroup.org/services/contributions.html
•
•
•
•
•

April 17-19

LZF lossless compression (H5Py)
BZIP2 lossless compression (PyTables)
BLOSC lossless compression (PyTables)
LZO lossless compression (PyTables)
MAFISC - Modified LZMA compression filter,
(Multidimensional Adaptive Filtering Improved Scientific
data Compression)

HDF/HDF-EOS Workshop XV

73
Compressing HDF5 dataset
• Example h5_gzip.py(c,f90)
• Creates compressed dataset using GZIP compression
with effort level 9
• Dataset has to be chunked
• Write/read/subset as for contiguous (no special steps are
needed)

dataset =
f.create_dataset('DS1',(32,64),'i',chunks=(4,8),compressi
on='gzip',compression_opts=9)
dataset[…] = data

April 17-19

HDF/HDF-EOS Workshop XV

74
Hints
• Do not make chunk sizes too small (e.g., 1x1)!
• Metadata overhead for each chunk (file space)
• Each chunk is read at once
• Many small reads are inefficient
• Some software (H5Py, netCDF-4) may pick up
chunk size for you; may not be what you need
• Example: Modify h5_gzip.py to use
dataset =
file.create_dataset('DS1',(32,64),'i',compression='gzip
',compression_opts=9)
Run h5dump –p –H gzip.h5 to check chunk size
April 17-19

HDF/HDF-EOS Workshop XV

75
More Information
• More detailed information on chunking can be
found in the “Chunking in HDF5” document at:
http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html

April 17-19

HDF/HDF-EOS Workshop XV

76
Thank You!

April 17-19

HDF/HDF-EOS Workshop XV

77
Acknowledgements
This work was supported by cooperative agreement
number NNX08AO77A from the National
Aeronautics and Space Administration (NASA).
Any opinions, findings, conclusions, or
recommendations expressed in this material are
those of the author[s] and do not necessarily reflect
the views of the National Aeronautics and Space
Administration.

April 17-19

HDF/HDF-EOS Workshop XV

78
Questions/comments?

April 17-19

HDF/HDF-EOS Workshop XV

79

More Related Content

PPTX
Introduction to HDF5 Data and Programming Models
PPT
Using HDF5 and Python: The H5py module
PDF
Python and HDF5: Overview
PPT
Substituting HDF5 tools with Python/H5py scripts
PDF
Hdf5 is for Lovers (PyData SV 2013)
Introduction to HDF5 Data and Programming Models
Using HDF5 and Python: The H5py module
Python and HDF5: Overview
Substituting HDF5 tools with Python/H5py scripts
Hdf5 is for Lovers (PyData SV 2013)

What's hot (20)

PPT
The Python Programming Language and HDF5: H5Py
PPTX
HDF Group Support for NPP/NPOESS/JPSS
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
PPTX
Tools to improve the usability of NASA HDF Data
PPTX
HDF4 Mapping Project Update
PPSX
NASA HDF/HDF-EOS Data for Dummies (and Developers)
PPT
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPSX
NASA HDF/HDF-EOS Data Access Challenges
PPT
PPT
Migrating from HDF5 1.6 to 1.8
PPT
Digital Object Identifiers for EOSDIS data
PPT
Projection Indexes for HDF5 Datasets
PPTX
Hdf5 parallel
The Python Programming Language and HDF5: H5Py
HDF Group Support for NPP/NPOESS/JPSS
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Tools to improve the usability of NASA HDF Data
HDF4 Mapping Project Update
NASA HDF/HDF-EOS Data for Dummies (and Developers)
Introduction to HDF5 Data Model, Programming Model and Library APIs
NASA HDF/HDF-EOS Data Access Challenges
Migrating from HDF5 1.6 to 1.8
Digital Object Identifiers for EOSDIS data
Projection Indexes for HDF5 Datasets
Hdf5 parallel
Ad

Viewers also liked (19)

PPTX
HDF & HDF-EOS Data & Support at NSIDC
PPTX
HDF Tools Updates and Discussions
PPTX
Earth Science Data and Information System (ESDIS) Project Update
PDF
Using IDL with Suomi NPP VIIRS Data
PPTX
Connecting HDF with ISO Metadata Standards
PPTX
Bridging ICESat and ICESat-2 Standard Data Products
PPT
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
PPT
Status of HDF-EOS, Related Software and Tools
PPT
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
PPTX
HDF OPeNDAP Project Update and Demo
PPT
PPTX
HDF Project Status and Plans
PPTX
Web-based On-demand Global NDVI Data Services
PDF
Data Storage for Remote Monitoring of CAT Machines Using HDF
PPTX
MATLAB, netCDF, and OPeNDAP
PPTX
HDF and netCDF Data Support in ArcGIS
PPTX
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
PPTX
iRODS: Interoperability in Data Management
HDF & HDF-EOS Data & Support at NSIDC
HDF Tools Updates and Discussions
Earth Science Data and Information System (ESDIS) Project Update
Using IDL with Suomi NPP VIIRS Data
Connecting HDF with ISO Metadata Standards
Bridging ICESat and ICESat-2 Standard Data Products
HDF-EOS to GeoTIFF Conversion Tool & HDF-EOS Plug-in for HDFView
Status of HDF-EOS, Related Software and Tools
GES DISC Eexperiences with HDF Formats for MEaSUREs Projects
HDF OPeNDAP Project Update and Demo
HDF Project Status and Plans
Web-based On-demand Global NDVI Data Services
Data Storage for Remote Monitoring of CAT Machines Using HDF
MATLAB, netCDF, and OPeNDAP
HDF and netCDF Data Support in ArcGIS
Access HDF-EOS data with OGC Web Coverage Service - Earth Observation Applica...
iRODS: Interoperability in Data Management
Ad

Similar to Advanced HDF5 Features (20)

PPT
HDF5 Advanced Topics - Datatypes and Partial I/O
PPTX
PPT
Hdf5 intro
PDF
Introduction to HDF5 Data Model, Programming Model and Library APIs
PDF
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPT
Using HDF5 tools for performance tuning and troubleshooting
PDF
Introduction to HDF5 for HDF4 Users
PPT
What will be new in HDF5?
PPT
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
PPT
Dimension Scales in HDF-EOS2 and HDF-EOS5
HDF5 Advanced Topics - Datatypes and Partial I/O
Hdf5 intro
Introduction to HDF5 Data Model, Programming Model and Library APIs
Introduction to HDF5 Data Model, Programming Model and Library APIs
Using HDF5 tools for performance tuning and troubleshooting
Introduction to HDF5 for HDF4 Users
What will be new in HDF5?
HDF5 Advanced Topics - Object's Properties, Storage Methods, Filters, Datatypes
Dimension Scales in HDF-EOS2 and HDF-EOS5

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Empathic Computing: Creating Shared Understanding
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
Teaching material agriculture food technology
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
Big Data Technologies - Introduction.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
NewMind AI Weekly Chronicles - August'25 Week I
sap open course for s4hana steps from ECC to s4
Empathic Computing: Creating Shared Understanding
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Teaching material agriculture food technology
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools

Advanced HDF5 Features

  • 1. HDF5 Advanced Topics Elena Pourmal The HDF Group The 15th HDF and HDF-EOS Workshop April 17, 2012 April 17-19 HDF/HDF-EOS Workshop XV 1
  • 2. Goal • To learn about HDF5 features important for writing portable and efficient applications using H5Py April 17-19 HDF/HDF-EOS Workshop XV 2
  • 3. Outline • Groups and Links • Types of groups and links • Discovering objects in an HDF5 file • Datasets • Datatypes • Partial I/O • Other features • Extensibility • Compression April 17-19 HDF/HDF-EOS Workshop XV 3
  • 4. GROUPS AND LINKS April 17-19 HDF/HDF-EOS Workshop XV 4
  • 5. Groups and Links • Groups are containers for links (graph edges) • Links were added in 1.8.0 • Warning: Many APIs in H5G interface are obsolete - use H5L interfaces to discover and manipulate file structure April 17-19 HDF/HDF-EOS Workshop XV 5
  • 6. Groups and Links HDF5 groups and links organize data objects. / Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 Every HDF5 file has a root group SimOut Viz lat | lon | temp ----|-----|----12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Timestep 36,000 April 17-19, 2012 HDF/HDF-EOS Workshop XV 6 Parameters 10;100;1000
  • 7. Example h5_links.py Different kinds of links / links.h5 A B dangling a soft a External Dataset can be “reached” using three paths /A/a /a /soft April 17-19, 2012 HDF/HDF-EOS Workshop XV dset.h5 Dataset is in a different file 7
  • 8. Example h5_links.py Different kinds of links / links.h5 A B dangling a soft Hard links “A” and “B” were created when groups were created Hard link “a” was added to the root group and points to an existing dataset Soft link “soft” points to the existing dataset (cmp. UNIX alias) Soft link “dangling” doesn’t point to any object April 17-19, 2012 HDF/HDF-EOS Workshop XV 8
  • 9. Links • Name • Example: “A”, “B”, “a”, “dangling”, “soft” • Unique within a group; “/” are not allowed in names • Type • Hard Link • Value is object’s address in a file • Created automatically when object is created • Can be added to point to existing object • Soft Link • Value is a string , for example, “/A/a”, but can be anything • Use to create aliases April 17-19 HDF/HDF-EOS Workshop XV 9
  • 10. Links (cont.) • Type • External Link • Value is a pair of strings , for example, (“dset.h5”, “dset” ) • Use to access data in other HDF5 files • Example: For NPP data products geo-location information may be in a separate file April 17-19 HDF/HDF-EOS Workshop XV 10
  • 11. Links Properties • Links Properties • ASCII or UTF-8 encoding for names • Create intermediate groups • Saves programming effort • C example lcpl_id = H5Pcreate(H5P_LINK_CREATE); H5Gcreate (fid, "A/B", lcpl_id, H5P_DEFAULT, H5P_DEFAULT); • Group “A” will be created if it doesn’t exist April 17-19 HDF/HDF-EOS Workshop XV 11
  • 12. Operations on Links • • • • • • See H5L interface in Reference Manual Create Delete Copy Iterate Check if exists April 17-19 HDF/HDF-EOS Workshop XV 12
  • 13. Operations on Links • APIs available for C and Fortran • Use dictionary operations in Python • Objects associated with links ARE NOT affected • Deleting a link removes a path to the object • Copying a link doesn’t copy an object April 17-19 HDF/HDF-EOS Workshop XV 13
  • 14. Example h5_links.py Link a in A is removed / links.h5 A B dangling a soft External Dataset can be “reached” using one paths /a dset.h5 Dataset is in a different file April 17-19, 2012 HDF/HDF-EOS Workshop XV 14
  • 15. Example h5_links.py Link a in root is removed / links.h5 A B dangling soft External dset.h5 Dataset is unreachable Dataset is in a different file April 17-19, 2012 HDF/HDF-EOS Workshop XV 15
  • 16. Groups Properties • Creation properties • Type of links storage • Compact (in 1.8.* versions) • Used with a few members (default under 8) • Dense (default behavior) • Used with many (>16) members (default) • Tunable size for a local heap • Save space by providing estimate for size of the storage required for links names • Can be compressed (in 1.8.5 and later) • Many links with similar names (XXX-abc, XXX-d, XXXefgh, etc.) • Requires more time to compress/uncompress data April 17-19 HDF/HDF-EOS Workshop XV 16
  • 17. Groups Properties • Creation properties • Links may have creation order tracked and indexed • Indexing by name (default) • A, B, a, dangling, soft • Indexing by creation order (has to be enabled) • A, B, a, soft, dangling • http://www.hdfgroup.org/ftp/HDF5/examples/exam ples-by-api/api18-c.html April 17-19 HDF/HDF-EOS Workshop XV 17
  • 18. Discovering HDF5 file’s structure • HDF5 provides C and Fortran 2003 APIs for recursive and non-recursive iterations over the groups and attributes • H5Ovisit and H5Literate (H5Giterate) • H5Aiterate • Life is much easier with H5Py (h5_visita.py) import h5py def print_info(name, obj): print name for name, value in obj.attrs.iteritems(): print name+":", value f = h5py.File('GATMO-SATMS-npp.h5', 'r+') f.visititems(print_info) f.close() April 17-19 HDF/HDF-EOS Workshop XV 18
  • 19. Checking a path in HDF5 • HDF5 1.8.8 provides HL C and Fortran 2003 APIs for checking if paths exists • H5LTvalid_path (h5ltvalid_path_f) • Example: Is there an object with a path /A/B/C/d ? • TRUE if there is a path, FALSE otherwise April 17-19 HDF/HDF-EOS Workshop XV 19
  • 20. Hints • Use latest file format (see H5Pset_libver_bound function in RM) • Save space when creating a lot of groups in a file • Save time when accessing many objects (>1000) • Caution: Tools built with the HDF5 versions prirt to 1.8.0 will not work on the files created with this property April 17-19 HDF/HDF-EOS Workshop XV 20
  • 23. HDF5 Datatypes • Integer and floating point • String • Compound • Similar to C structures or Fortran Derived Types • • • • • Array References Variable-length Enum Opaque April 17-19 HDF/HDF-EOS Workshop XV 23
  • 24. HDF5 Datatypes • Datatype descriptions • Are stored in the HDF5 file with the data • Include encoding (e.g., byte order, size, and floating point representation) and other information to assure portability across platforms • See C, Fortran, MATLAB and Java examples under http://www.hdfgroup.org/ftp/HDF5/examples/ April 17-19 HDF/HDF-EOS Workshop XV 24
  • 25. Data Portability in HDF5 Array of integers on Intel platform Array of long integers on SPARC64 platform long is big-endian, 8 bytes int is little-endian, 4 bytes int long H5Dwrite H5Dread H5T_STD_I32LE April 17-19 HDF/HDF-EOS Workshop XV 25
  • 26. Data Portability in HDF5 (cont.) We use native integer type to describe data in a file dset = H5Dcreate(file,NAME,H5T_NATIVE_INT,… Description of data in a buffer H5Dwrite(dset,H5T_NATIVE_INT,…,buf); H5Dread(dset,H5T_NATIVE_LONG,…, buf); Description of data in a buffer; library will perform Conversion from 4 byte LE to 8 byte BE integer April 17-19 HDF/HDF-EOS Workshop XV 26
  • 27. Hints • Avoid datatype conversion if possible • Store necessary precision to save space in a file • Starting with HDF5 1.8.7, Fortran APIs support different kinds of integers and floats (if Fortran 2003 feature is enabled) April 17-19 HDF/HDF-EOS Workshop XV 27
  • 29. HDF5 Strings • Fixed length • Data elements has to have the same size • Short strings will use more byte than needed • Application responsible for providing buffers of the correct size on read • Variable length • Data elements may not have the same size • Writing/reading strings is “easy”; library handles memory allocations April 17-19 HDF/HDF-EOS Workshop XV 29
  • 30. HDF5 Strings – Fixed-length • Example h5_string.py(c,f90) fixed_string = np.dtype('a10') dataset = file.create_dataset("DSfixed",(4,), dtype=fixed_string) data = ("Parting", ".is such", ".sweet", ".sorrow...") dataset[...] = data • Stores fours strings “Parting", ” .is such", ” .sweet", ”.sorrow…” in a dataset. • Strings have length 10 • Python uses NULL padded strings (default) April 17-19 HDF/HDF-EOS Workshop XV 30
  • 31. HDF5 Strings • Example h5_vlstring.py(c,f90) str_type = h5py.new_vlen(str) dataset = file.create_dataset("DSvariable",(4,), dtype=str_type) data = ("Parting", " is such", " sweet", " sorrow...") dataset[...] = data • Stores fours strings “Parting", ” is such", ” sweet", ”sorrow…” in a dataset. • Strings have length 7, 8, 6, 10 April 17-19 HDF/HDF-EOS Workshop XV 31
  • 32. Hints • Fixed length strings • Can be compressed • Use when need to store a lot of strings • Variable-length strings • Compression cannot be applied to data • Use for attributes and a few strings if space is a concern April 17-19 HDF/HDF-EOS Workshop XV 32
  • 33. HDF5 Compound Datatypes April 17-19 HDF/HDF-EOS Workshop XV 33
  • 34. HDF5 Compound Datatypes • Compound types • Comparable to C structures or Fortran 90 Derived Types • Members can be of any datatype • Data elements can written/read by a single field or a set of fields April 17-19 HDF/HDF-EOS Workshop XV 34
  • 35. Creating and Writing Compound Dataset • Example h5_compound.py(c,f90) • Stores four records in the dataset Orbit integer Location string Temperature (F) 64-bit float Pressure (inHg) 64-bit-float 1153 Sun 53.23 24.57 1184 Moon 55.12 22.95 1027 Venus 103.55 31.33 1313 Mars 1252.89 84.11 April 17-19 HDF/HDF-EOS Workshop XV 35
  • 36. Creating and Writing Compound Dataset comp_type = np.dtype([('Orbit’,'i'),('Location’,np.str_, 6), ….) dataset = file.create_dataset("DSC",(4,), comp_type) dataset[...] = data Note for C and Fortran2003 users: • You’ll need to construct memory and file datatypes • Use HOFFSET macro instead of calculating offset by hand. • Order of H5Tinsert calls is not important if HOFFSET is used. April 17-19 HDF/HDF-EOS Workshop XV 36
  • 37. Reading Compound Dataset f = h5py.File('compound.h5', 'r') dataset = f ["DSC"] …. orbit = dataset['Orbit'] print "Orbit: ", orbit data = dataset[...] print data …. print dataset[2, 'Location'] April 17-19 HDF/HDF-EOS Workshop XV 37
  • 38. Fortran 2003 • HDF5 Fortran library 1.8.8 with Fortran 2003 enabled has the same capabilities for writing derived types as C library • H5OFFSET function • No need to write/read by fields as before April 17-19 HDF/HDF-EOS Workshop XV 38
  • 39. Hints • When to use compound datatypes? • Application needs access to the whole record • When not to use compound datatypes? • Application needs access to specific fields often • Store the field in a dataset / / DSC Pressure Orbit Location Temperature April 17-19 HDF/HDF-EOS Workshop XV 39
  • 40. HDF5 Reference Datatypes April 17-19 HDF/HDF-EOS Workshop XV 40
  • 41. References to Objects and Dataset Regions / Test Data Viz References to HDF5 Objects References to dataset regions . Group Image 2….. Image 3….. April 17-19, 2012 HDF/HDF-EOS Workshop XV 41 .
  • 42. Reference Datatypes • Object Reference • Unique identifier of an object in a file • HDF5 predefined datatype H5T_STD_REG_OBJ • Dataset Region Reference • Unique identifier to a dataset + dataspace selection • HDF5 predefined datatype H5T_STD_REF_DSETREG April 17-19 HDF/HDF-EOS Workshop XV 42
  • 43. Conceptual view of HDF5 NPP file XML User’s Block Product Group Root - / Agg Reference Object Data Gran n Reference Region Reference Region 43
  • 44. NPP HDF5 file in HDFView April 17-19 HDF/HDF-EOS Workshop XV 44
  • 45. HDF5 Object References • h5_objref.py (c,f90) • Creates a dataset with object references 1. 2. 3. 4. group = f.create_group("G1") Scalar dataspace dataset = f.create_dataset("DS2",(), 'i') # Create object references to a group and a dataset refs = (group.ref, dataset.ref) 5. ref_type = h5py.h5t.special_dtype(ref=h5py.Reference) 6. dataset_ref = file.create_dataset("DS1", (2,),ref_type) 7. dataset_ref[...] = refs April 17-19 HDF/HDF-EOS Workshop XV 45
  • 46. HDF5 Object References (cont.) • h5_objref.py (c,f90) • Finding the object a reference points to: 1. 2. 3. 4. 5. 6. f = h5py.File('objref.h5','r') dataset_ref = f["DS1"] print h5py.h5t.check_dtype(ref=dataset_ref.dtype) refs = dataset_ref[...] refs_list = list(refs) for obj in refs_list: print April 17-19 f[obj] HDF/HDF-EOS Workshop XV 46
  • 47. HDF5 Dataset Region References • h5_regref.py (c,f90) • Creates a dataset with region references to each row in a dataset 1. 2. 3. 4. refs = (dataset.regionref[0,:],…,dataset.regionref[2,:]) ref_type = h5py.h5t.special_dtype(ref=h5py.RegionReference) dataset_ref = file.create_dataset("DS1", (3,),ref_type) dataset_ref[...] = refs April 17-19 HDF/HDF-EOS Workshop XV 47
  • 48. HDF5 Dataset Region References (cont.) • h5_regref.py (c,f90) • Finding a dataset and a data region pointed by a region reference 1. 2. 3. 4. 5. 6. path_name = f[regref].name print path_name # Open the dataset using the pathname we just found data = file[path_name] # Region reference can be used as a slicing argument! print data[regref] April 17-19 HDF/HDF-EOS Workshop XV 48
  • 49. Hints • When to use HDF5 object references? • Instead of an attribute with a lot of data • Create an attribute of the object reference type and point to a dataset with the data • In a dataset to point to related objects in HDF5 file • When to use HDF5 region references? • In datasets and attributes to point to a region of interest • When accessing the same region many times to avoid hyperslab selection process April 17-19 HDF/HDF-EOS Workshop XV 49
  • 50. Partial I/O Working with subsets April 17-19 HDF/HDF-EOS Workshop XV 50
  • 51. Collect data one way …. Array of images (3D) April 17-19 HDF/HDF-EOS Workshop XV 51
  • 52. Display data another way … Stitched image (2D array) April 17-19 HDF/HDF-EOS Workshop XV 52
  • 53. Data is too big to read…. April 17-19 HDF/HDF-EOS Workshop XV 53
  • 54. How to Describe a Subset in HDF5? • Before writing and reading a subset of data one has to describe it to the HDF5 Library. • HDF5 APIs and documentation refer to a subset as a “selection” or “hyperslab selection”. • If specified, HDF5 Library will perform I/O on a selection only and not on all elements of a dataset. April 17-19 HDF/HDF-EOS Workshop XV 54
  • 55. Types of Selections in HDF5 • Two types of selections • Hyperslab selection • Regular hyperslab • Simple hyperslab • Result of set operations on hyperslabs (union, difference, …) • Point selection • Hyperslab selection is especially important for doing parallel I/O in HDF5 (See Parallel HDF5 Tutorial) April 17-19 HDF/HDF-EOS Workshop XV 55
  • 56. Regular Hyperslab Collection of regularly spaced equal size blocks April 17-19 HDF/HDF-EOS Workshop XV 56
  • 57. Simple Hyperslab Contiguous subset or sub-array April 17-19 HDF/HDF-EOS Workshop XV 57
  • 58. Hyperslab Selection Result of union operation on three simple hyperslabs April 17-19 HDF/HDF-EOS Workshop XV 58
  • 59. Hyperslab Description • Start - starting location of a hyperslab (1,1) • Stride - number of elements that separate each block (3,2) • Count - number of blocks (2,6) • Block - block size (2,1) • Everything is “measured” in number of elements April 17-19 HDF/HDF-EOS Workshop XV 59
  • 60. Simple Hyperslab Description • Two ways to describe a simple hyperslab • As several blocks • Stride – (1,1) • Count – (3,4) • Block – (1,1) • As one block • Stride – (1,1) • Count – (1,1) • Block – (3,4) No performance penalty for one way or another April 17-19 HDF/HDF-EOS Workshop XV 60
  • 61. Writing and Reading a Hyperslab • Example h5_hype.py(c, f90) • Creates 8x10 integer dataset and populates with data; writes a simple hyperslab (3x4) starting at offset (1,2) • H5Py uses NumPy indexing to specify a hyperslab • Numpy indexing array[i : j : k] • i – the starting index; j – the stopping index; k – is the step (≠ 0) dataset[1:4, 2:6] offset April 17-19 count+offset HDF/HDF-EOS Workshop XV 61
  • 62. Writing and Reading Simple Hyperslab dataset[1:4, 2:6] = 5 print "Data after selection is written:" print dataset[...] [[1 [1 [1 [1 [1 [1 [1 [1 April 17-19 1 1 1 1 1 1 1 1 1 5 5 5 1 1 1 1 1 5 5 5 1 1 1 1 1 5 5 5 1 1 1 1 2 5 5 5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] 2] 2] 2] 2] 2] 2] 2]] HDF/HDF-EOS Workshop XV 62
  • 63. Writing and Reading Regular Hyperslab space_id = dataset.id.get_space() space_id.select_hyperslab((1,1), (2,2), stride=(4,4), block=( 2,2)) dataset.id.read(space_id, space_id, data_selected) print data_selected Selected data read from file.... [[0 [0 [0 [0 [0 [0 [0 [0 April 17-19 0 1 1 0 0 1 1 0 0 5 5 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 0 0 2 2 0 0 2 2 0 0 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0] 0] 0] 0] 0] 0] 0] 0]] HDF/HDF-EOS Workshop XV 63
  • 64. Writing and Reading Point Selection • Example h5_selecelem.py(c, f90) • Creates 2 integer datasets and populates with data; writes a point selection at locations (0,1) and (0, 3) • H5Py uses NumPy indexing to specify points in array val = (55,59) dataset2[0, [1,3]] = val [[ 1 55 [ 1 1 [ 1 1 April 17-19 1 59] 1 1] 1 1]] HDF/HDF-EOS Workshop XV 64
  • 65. Hints • C and Fortran • Applications’ memory grows with the number of open handles. • Don’t keep dataspace handles open if unnecessary, e.g., when reading hyperslab in a loop. • Make sure that selection in a file has the same number of elements as selection in memory when doing partial I/O. April 17-19 HDF/HDF-EOS Workshop XV 65
  • 66. Other Features Storage, Extendibility, Compression April 17-19 HDF/HDF-EOS Workshop XV 66
  • 67. Dataset Storage Options • Compact • Used for storing small (a few Ks) data • Contiguous (default) • Used for accessing contiguous subsets of data • Chunked • Data is store in chunks of predefined size • Used when: • Appending data • Compressing data • Accessing non-contiguous data (e.g., columns) April 17-19 HDF/HDF-EOS Workshop XV 67
  • 68. HDF5 Dataset Metadata Dataset data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 April 17-19 HDF/HDF-EOS Workshop XV 68
  • 69. Examples of Data Storage Compact Metadata Raw data Contiguous April 17-19 HDF/HDF-EOS Workshop XV Chunked 69
  • 70. Extending HDF5 dataset • Example h5_unlim.py(c,f90) • Creates a dataset and appends rows and columns • Dataset has to be chunked • Chunk sizes do not need to be factors of the dimension sizes dataset = f.create_dataset('DS1',(4,7),'i',chunks=(3,3), maxshape=(None, None)) 0 0 0 0 0 0 April 17-19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 HDF/HDF-EOS Workshop XV 0 0 0 0 0 0 0 0 0 0 0 0 70
  • 71. Extending HDF5 dataset • Example h5_unlim.py(c,f90) dataset.resize((6,7)) dataset[4:6] = 1 dataset.resize((6,10)) dataset[:,7:10] = 2 0 0 0 0 1 1 April 17-19 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 1 2 2 2 2 2 2 HDF/HDF-EOS Workshop XV 2 2 2 2 2 2 2 2 2 2 2 2 71
  • 72. HDF5 compression • • • Chunking is required for compression and other filters HDF5 filters modify data during I/O operations Compression filters in HDF5 • • • • April 17-19 Scale + offset (H5Pset_scaleoffset) N-bit (H5Pset_nbit) GZIP (deflate) (H5Pset_deflate) SZIP (H5Pset_szip) HDF/HDF-EOS Workshop XV 72
  • 73. HDF5 Third-Party Filters • Compression methods supported by HDF5 User’s community http://www.hdfgroup.org/services/contributions.html • • • • • April 17-19 LZF lossless compression (H5Py) BZIP2 lossless compression (PyTables) BLOSC lossless compression (PyTables) LZO lossless compression (PyTables) MAFISC - Modified LZMA compression filter, (Multidimensional Adaptive Filtering Improved Scientific data Compression) HDF/HDF-EOS Workshop XV 73
  • 74. Compressing HDF5 dataset • Example h5_gzip.py(c,f90) • Creates compressed dataset using GZIP compression with effort level 9 • Dataset has to be chunked • Write/read/subset as for contiguous (no special steps are needed) dataset = f.create_dataset('DS1',(32,64),'i',chunks=(4,8),compressi on='gzip',compression_opts=9) dataset[…] = data April 17-19 HDF/HDF-EOS Workshop XV 74
  • 75. Hints • Do not make chunk sizes too small (e.g., 1x1)! • Metadata overhead for each chunk (file space) • Each chunk is read at once • Many small reads are inefficient • Some software (H5Py, netCDF-4) may pick up chunk size for you; may not be what you need • Example: Modify h5_gzip.py to use dataset = file.create_dataset('DS1',(32,64),'i',compression='gzip ',compression_opts=9) Run h5dump –p –H gzip.h5 to check chunk size April 17-19 HDF/HDF-EOS Workshop XV 75
  • 76. More Information • More detailed information on chunking can be found in the “Chunking in HDF5” document at: http://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/index.html April 17-19 HDF/HDF-EOS Workshop XV 76
  • 78. Acknowledgements This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration. April 17-19 HDF/HDF-EOS Workshop XV 78

Editor's Notes

  • #8: Example h5_links.py creates a file links.h5 and two groups “A” and “B” in it.Then it creates a one-dimensional array “a” in group “A”. After the datasets was created a hard link “a” was added to the root group. (It is one dimensional in example and doesn’t have data). Also soft link with the value “/A/a” was added to the root group along with the dangling soft link “dangling”.External link “External” was added to group B. It points to a dataset “dset” in dset.h5.
  • #15: When link a in A is removed dataset can be reached using path /a, soft link becomes dangling
  • #16: When link a in the root group is removed, dataset with graph becomes unreachable; soft link cannot be resolved without a “real” path
  • #25: Portability: insuresendianess conversion, size conversion, structures portability, etc.
  • #27: Failure to describe correctly memory buffer or to provide “unconvertible” type will result in wrong data read/written
  • #44: This slide is from HDFand HDF-EOS Workshop IX, Kim Tomashosky “NPOESS Product Delivery in HDF”