SlideShare a Scribd company logo
2
Most read
6
Most read
14
Most read
V6.0
Getting Started With HDF5
• Why have we brought in a new data format?
• What actually is HDF5?
• How do I create HDF5 files?
• How do I read in HDF5 files
– Reading one file at a time
– Reading multiple files and selections
• Points to Note
• Future Developments
SEGY is great but…
• It is designed to be read sequentially from tape
– and our “index” file solution didn’t scale well to “big data”
– and our index file solution only allowed primary key access
• It only has 240 bytes of 32-bit integer headers defined
– and our extended trace headers didn’t scale well to “big data”
• Some processes require “n-key random access”
– “surface consistent” suite, PreSTM, 3DSRME etc.
• You need to read the whole file to access trace headers
– Some “database” systems offer more flexibility
• Parallel I/O doesn’t scale well on large clusters
So what is HDF5?
• Developed over the last 20 years
• Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/
• Now developed by the HDF5 Group http//:www.hdfgroup.org
• A suite of technologies, not just a file format
• General purpose library and file format for storing scientific data
• Fully supported set of command line tools, APIs and interfaces
• A pan-industry open standard
• Used for storage by both MatLab and Scilab, can be read by Mathmatica
• Fully supported set of command line tools, APIs and interfaces
• A self describing format
• No ambiguity about integer or floating point types or storage in trace bytes
• Names can be allocated to components, as you would in a database structure
• Built for “big data”
• Petabyte+ scale datasets running on tens of thousands of cores
Our Implementation of HDF5
HDFView 2.9 : free, third party
tool, showing how any HDF5
application can open the new
format
Data, Processing History, 400-byte
reel header, 3200-byte text
header, history and trace headers
from Claritas extended SEGY all
present
Seismic samples displayed
graphically – could also be
displayed as a table
All trace headers – SEGY 240byte
and extended - opened in a
spreadsheet; full mathematical
operations
We have “encapsulated” the GLOBE Claritas SEGY in HDF5
The 400-byte binary reel header
opened as a table, so that values
can be edited or modified
Creating HDF5 Files : SEISWRITE
Specify a file name!
Optimisation controls; these have smart defaults set and
can be modified for managing very large datasets where
you know that non-sequential read-access will be
needed, or partial read of trace samples will be required
Replaces current use of DISCWRITE, although this will continue to be available
New functionality development will focus on SEISWRITE and HDF5 format data
Reading HDF5 files : SEISREAD
With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules
You don’t need to worry about the order of data on disc, just how you want to read it
Simple Reading
File Name
Primary key order;
default is
all, ascending
Secondary key order;
default is
all, ascending
Tertiary key order; only
when needed
You can read data in ANY order;
original order doesn’t matter
Selection and Repeats
6 Repeat copies specified
Primary key SHOTID with only
SHOTID 900 only selected; note
tolerance
Secondary key CHANNEL, all
selected, in ascending order (default)
Six copies of SHOTID 900 passed to the
processing flow, with REPEAT set from 1-6
More Complex Selections
Two copies of SHOTIDs from 100 to 900 with
an increment of 100, all channels in
ascending, with REPEAT set to 1 and 2
More complex SHOTID selection using
the same syntax as DISCREAD; note
tolerance is set to 0
Sorting to CDP (DISCGATH)
Identical to simple reading
Specify CDP and primary key
Specify CDPTRACE as secondary key
Default is to read all data in ascending
primary/secondary key order
Reading Multiple Files
Seismic File List used in the same
format as with DISCREAD, with
selections
SETRAEPEAT parameter used as per
DISCREAD to create panels, files are
merged if this is “no”
Primary Key defined here is used in the
Seismic File List definition
This last file has a “native”
ordering of
CDP, CDPTRACE, but will be
order to SHOT, CHANNEL on
read, automatically
Points to Note
• Can only specify a primary key in a Seismic File List
– Same as DISCWRITE, although the original data order no longer matters
• User needs to managed extended trace headers merge
– Use DELHDR prior to merging files; will be removed in future releases
• Files can be 10-15% larger than SEGY
• Compatible with Cluster File Systems (Gluster etc.)
• I/O above about 2Gbytes should be improved
Future development
• Improved PKEY/SKEY/TKEY selection handling
• Direct update of trace headers from applications
– Geometry, SV (FB picks) etc.
• Add HDF5 support in KPRET2D
– Only module where this is not available
• Add full parallel I/O to iMage suite
– Increase parallel scalability even further
• Algorithmic optimisation
– Re-write to take full advantage of random access

More Related Content

PDF
GEOLOGÍA Y APROVECHAMIENTO INTEGRAL DE LAS PERFORACIONES EN EL ALTIPLANO MEXI...
PPT
Research proposal
PDF
lect 4- petroleum exploration- part1.pdf
PDF
Classification of Ore Deposits | Economic Geology
PPTX
Yeast Two Hybrid System
PPTX
Geologist cover letter
PPT
7 - In-situ stress characterisation.ppt
PDF
Petroleum introduction
GEOLOGÍA Y APROVECHAMIENTO INTEGRAL DE LAS PERFORACIONES EN EL ALTIPLANO MEXI...
Research proposal
lect 4- petroleum exploration- part1.pdf
Classification of Ore Deposits | Economic Geology
Yeast Two Hybrid System
Geologist cover letter
7 - In-situ stress characterisation.ppt
Petroleum introduction

What's hot (17)

PPTX
CNN Lithology Prediction (Undergrad Thesis Jeremy Adi Padma Nagara - Universi...
PPTX
Delaware Basin Structural Relationships_Manos
PDF
Overview To Linked In
PPT
X-Ray Diffractogram for clay mineralogy Identification, analytical bckv, P.K...
PPT
Structural database and their classification by abdul qahar
PPT
Structure of DNA
PPTX
Fundamentals of Fluorescence in situ Hybridization
PPTX
Human genome project - Decoding the codes of life
PPTX
Interpretation of dna typing results and codis
PPTX
Dna profiling
PPTX
Forensic Entomology
PPTX
Exploration and analysis of oil and gas field ( 3D seismic survey)
PPTX
Dna typing methods
PPTX
proteomics
PPT
OMICS tecnology
PPTX
Dna sequencing
CNN Lithology Prediction (Undergrad Thesis Jeremy Adi Padma Nagara - Universi...
Delaware Basin Structural Relationships_Manos
Overview To Linked In
X-Ray Diffractogram for clay mineralogy Identification, analytical bckv, P.K...
Structural database and their classification by abdul qahar
Structure of DNA
Fundamentals of Fluorescence in situ Hybridization
Human genome project - Decoding the codes of life
Interpretation of dna typing results and codis
Dna profiling
Forensic Entomology
Exploration and analysis of oil and gas field ( 3D seismic survey)
Dna typing methods
proteomics
OMICS tecnology
Dna sequencing
Ad

Similar to A quick start guide to using HDF5 files in GLOBE Claritas (20)

PPTX
HADOOP TECHNOLOGY ppt
PDF
9.-dados e processamento distribuido-hadoop.pdf
PPTX
HADOOP TECHNOLOGY ppt
PDF
Chapter2.pdf
PPTX
Apache hadoop basics
PDF
hdfs readrmation ghghg bigdats analytics info.pdf
PDF
AHUG Presentation: Fun with Hadoop File Systems
PPTX
Hadoop File system (HDFS)
PDF
Hadoop data management
PPTX
Cloud Computing - Cloud Technologies and Advancements
PPTX
Unit-3.pptx
PPTX
Data Analytics presentation.pptx
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PDF
Data Modeling in Hadoop - Essentials for building data driven applications
PDF
Introduction to HDF5 Data Model, Programming Model and Library APIs
PPTX
Hadoop File System.pptx
PPTX
Big Data-Session, data engineering and scala
PPT
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
PPTX
Hadoop.pptx
PPTX
Hadoop.pptx
HADOOP TECHNOLOGY ppt
9.-dados e processamento distribuido-hadoop.pdf
HADOOP TECHNOLOGY ppt
Chapter2.pdf
Apache hadoop basics
hdfs readrmation ghghg bigdats analytics info.pdf
AHUG Presentation: Fun with Hadoop File Systems
Hadoop File system (HDFS)
Hadoop data management
Cloud Computing - Cloud Technologies and Advancements
Unit-3.pptx
Data Analytics presentation.pptx
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Data Modeling in Hadoop - Essentials for building data driven applications
Introduction to HDF5 Data Model, Programming Model and Library APIs
Hadoop File System.pptx
Big Data-Session, data engineering and scala
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
Hadoop.pptx
Hadoop.pptx
Ad

More from Guy Maslen (8)

PPTX
Human error, brains and how agility helps
PPTX
GLOBE Claritas V6.6 at a glance
PPTX
Globe Claritas v6.5 at a glance
PPTX
Globe claritas v6.5 at a glance
PPTX
Exploring Bad Deconvolution Design - some examples
PPTX
GLOBE Claritas v6.2 at a Glance
PPT
Demultiple Routes
PPT
GLOBE Claritas 2011-12
Human error, brains and how agility helps
GLOBE Claritas V6.6 at a glance
Globe Claritas v6.5 at a glance
Globe claritas v6.5 at a glance
Exploring Bad Deconvolution Design - some examples
GLOBE Claritas v6.2 at a Glance
Demultiple Routes
GLOBE Claritas 2011-12

Recently uploaded (20)

PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
The AUB Centre for AI in Media Proposal.docx

A quick start guide to using HDF5 files in GLOBE Claritas

  • 2. Getting Started With HDF5 • Why have we brought in a new data format? • What actually is HDF5? • How do I create HDF5 files? • How do I read in HDF5 files – Reading one file at a time – Reading multiple files and selections • Points to Note • Future Developments
  • 3. SEGY is great but… • It is designed to be read sequentially from tape – and our “index” file solution didn’t scale well to “big data” – and our index file solution only allowed primary key access • It only has 240 bytes of 32-bit integer headers defined – and our extended trace headers didn’t scale well to “big data” • Some processes require “n-key random access” – “surface consistent” suite, PreSTM, 3DSRME etc. • You need to read the whole file to access trace headers – Some “database” systems offer more flexibility • Parallel I/O doesn’t scale well on large clusters
  • 4. So what is HDF5? • Developed over the last 20 years • Initially by National Centre for Supercomputing Applications http://www.ncsa.illinois.edu/ • Now developed by the HDF5 Group http//:www.hdfgroup.org • A suite of technologies, not just a file format • General purpose library and file format for storing scientific data • Fully supported set of command line tools, APIs and interfaces • A pan-industry open standard • Used for storage by both MatLab and Scilab, can be read by Mathmatica • Fully supported set of command line tools, APIs and interfaces • A self describing format • No ambiguity about integer or floating point types or storage in trace bytes • Names can be allocated to components, as you would in a database structure • Built for “big data” • Petabyte+ scale datasets running on tens of thousands of cores
  • 5. Our Implementation of HDF5 HDFView 2.9 : free, third party tool, showing how any HDF5 application can open the new format Data, Processing History, 400-byte reel header, 3200-byte text header, history and trace headers from Claritas extended SEGY all present Seismic samples displayed graphically – could also be displayed as a table All trace headers – SEGY 240byte and extended - opened in a spreadsheet; full mathematical operations We have “encapsulated” the GLOBE Claritas SEGY in HDF5 The 400-byte binary reel header opened as a table, so that values can be edited or modified
  • 6. Creating HDF5 Files : SEISWRITE Specify a file name! Optimisation controls; these have smart defaults set and can be modified for managing very large datasets where you know that non-sequential read-access will be needed, or partial read of trace samples will be required Replaces current use of DISCWRITE, although this will continue to be available New functionality development will focus on SEISWRITE and HDF5 format data
  • 7. Reading HDF5 files : SEISREAD With HDF5 format, you use SEISREAD in place of the DISCxxxxx Modules You don’t need to worry about the order of data on disc, just how you want to read it
  • 8. Simple Reading File Name Primary key order; default is all, ascending Secondary key order; default is all, ascending Tertiary key order; only when needed You can read data in ANY order; original order doesn’t matter
  • 9. Selection and Repeats 6 Repeat copies specified Primary key SHOTID with only SHOTID 900 only selected; note tolerance Secondary key CHANNEL, all selected, in ascending order (default) Six copies of SHOTID 900 passed to the processing flow, with REPEAT set from 1-6
  • 10. More Complex Selections Two copies of SHOTIDs from 100 to 900 with an increment of 100, all channels in ascending, with REPEAT set to 1 and 2 More complex SHOTID selection using the same syntax as DISCREAD; note tolerance is set to 0
  • 11. Sorting to CDP (DISCGATH) Identical to simple reading Specify CDP and primary key Specify CDPTRACE as secondary key Default is to read all data in ascending primary/secondary key order
  • 12. Reading Multiple Files Seismic File List used in the same format as with DISCREAD, with selections SETRAEPEAT parameter used as per DISCREAD to create panels, files are merged if this is “no” Primary Key defined here is used in the Seismic File List definition This last file has a “native” ordering of CDP, CDPTRACE, but will be order to SHOT, CHANNEL on read, automatically
  • 13. Points to Note • Can only specify a primary key in a Seismic File List – Same as DISCWRITE, although the original data order no longer matters • User needs to managed extended trace headers merge – Use DELHDR prior to merging files; will be removed in future releases • Files can be 10-15% larger than SEGY • Compatible with Cluster File Systems (Gluster etc.) • I/O above about 2Gbytes should be improved
  • 14. Future development • Improved PKEY/SKEY/TKEY selection handling • Direct update of trace headers from applications – Geometry, SV (FB picks) etc. • Add HDF5 support in KPRET2D – Only module where this is not available • Add full parallel I/O to iMage suite – Increase parallel scalability even further • Algorithmic optimisation – Re-write to take full advantage of random access