SlideShare a Scribd company logo
Subsetting at UAH

Bruce Beaumont, Matt Smith,
Helen Conover, Sara Graves

HDF & HDF-EOS Workshop VIII
2004 October 26-28 Aurora, CO
Why Subset?
• Goal: to provide a science data user with only the data
they need as quickly as possible.
• Benefits science data users and data centers:
- Reduces analysis time by reducing amount of data
- Reduces time for data delivery
- Reduces resources (network, personnel, media, etc.)
• Steps:
- Locate spatial / temporal / spectral area of interest
- Extract
- Re-assemble for distribution/use
What is HSE?
HEW Subsetting Engine
A new packaging option for the HDFEOS Subsetter designed for users that
want subsetting, but do not want to host
the full HEW installation or even the
standalone HEW back-end.
What is HEW?
• HDF-EOS Web-based Subsetter
– Prototype software designed to be datasetindependent (HDF-EOS)
– Funded by NASA/ESDIS for EOS-DIS
– To be used within ECS (EOS-DIS Core System)
– Original Front-end/GUI (optional)
• Uses HTML forms and JavaScript

– Original Back-end
• Needed subset criteria and HDF-EOS data
• Performed subsetting as a “batch” job
What are HSE’s capabilities?
• Versions available for HDF-EOS 2 and HDF-EOS 5.
• Subsets multiple files in one call.
• Subsets properly-formatted HDF-EOS grid and swath objects.
• Subsets spatially by latitude/longitude or row/column.
• Subsets swaths temporally by date/time range.
• Subsets swaths by full or partial (subscan) lines.
• Subsets spectrally by HDF-EOS field.
• Subsamples along any dimension by stride (repeating interval)
or by discrete index list. (see chart)
• Copies all file and field attributes to the output file and updates
HDF-EOS “core” metadata when possible.
Stride Subsampling on a Dimension
•STRIDE = 2
0

1

2

3

4

5

6

7

8

9

7

8

9

Selected

Indexed Subsampling on a Dimension
•INDEXES = (1, 3, 4, 5, 7)
0

1

2

3

4

5

Selected

6
Existing HEW Back-End
Architecture
Subset t ing
crit eria
( ODL)

St at us file

End
-user

E-mail

HEW_ DISS

Input
file( s)

Out put
file( s)

Log file

• Front end supplies
subsetting criteria file
• Subsetter writes messages
to status file
• Subsetter writes messages
to log file
• Subsetter sends e-mail to
end-user

Many sites do not want to create ODL files or the hassle of dealing
with message or log files or e-mail.
New HSE
Architecture
All subsetting functionality is contained
within a callable function
•
User Applicat ion Code
Subset t ing crit eria Ret urn code

•
Input
file( s)

HEW Subset t ing Engine
funct ion

HSE_ LogMsg
callback

HSE_ St at usMsg
callback

Out put
file( s)

•

User application code
builds subsetting criteria
structure
Subsetting engine
function calls user’s
functions for status and
log messages
No e-mail is sent
SPOT
• Subsettability “checker”
– Displays content/structure
of HDF-EOS4 (or HE5) files
– Examines files for subsettability by HSE
– Simple command-line interface
– Stand-alone operation
– Available at subset.org
HEW integration with ECS
EDG System

EDG
Order
submission
(HTML)

End
user

ECS

2

ECS

1
7
3

Output data
(Reingested)

4
Data order
and reply

Subset ODL
and reply

Output
data

6

Subsetter
Subsetting System

5
Input
data
Product Availability Matrix
HDF-EOS 2
(HDF4)

HDF-EOS 5
(HDF5)

SGI • Sun • Linux

N/A

N/A

SGI • Sun • Linux

HEWBE

SGI • Sun • Linux

Planned

HEW

SGI • Sun • Linux

N/A

Product
HSE
HSE-5

SPOT

SGI • Sun • Linux
Currently Available/Planned
Subsetting Applications
•

HDF-EOS Subsetting Engine (HDF-EOS, HE5)Status Deployments
–
–
–
–
–
–

•

Complete System
Subsetting Engine Only
SPOT - Subsettability Checker
HSE Integrated with ECS Data Order System
HSE Integrated with AMSR-E Processing
Subsetting as a Web Service

GHRC
GSFC
many
NSIDC, EDC
AMSR-E SIPS
(ECHO)

available
available

Science teams
GHRC

in work

(various)

Customized Subsetting
– MODIS tools
– Coarse-grain SSM/I Subsetter

•

available
available
available
available
available
planned

General Purpose Customizable Subsetting
– Subsetting Tool using ESML
http://subset.org

More Related Content

PPT
HDF-EOS Overview and Status
PPTX
EAS Data Flow lessons learnt
PDF
Dell Lustre Storage Architecture Presentation - MBUG 2016
PDF
Blazing Fast Lustre Storage
PDF
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
PPTX
The Importance of Fast, Scalable Storage for Today’s HPC
PDF
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
PDF
HDF-EOS Development - Current Status and Schedule
HDF-EOS Overview and Status
EAS Data Flow lessons learnt
Dell Lustre Storage Architecture Presentation - MBUG 2016
Blazing Fast Lustre Storage
Accelerating analytics workloads with Alluxio data orchestration and Intel® O...
The Importance of Fast, Scalable Storage for Today’s HPC
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
HDF-EOS Development - Current Status and Schedule

What's hot (20)

PDF
HDF-EOS Development: Current Status and Tools
PDF
Achieving Separation of Compute and Storage in a Cloud World
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
PDF
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
PPTX
PDF
Fluid: When Alluxio Meets Kubernetes
PDF
Scalable and High available Distributed File System Metadata Service Using gR...
PPTX
Sector Cloudcom Tutorial
PPT
Nov 2010 HUG: Fuzzy Table - B.A.H
PDF
[KCC oral] 정준영
PPTX
Unit 2.pptx
PDF
Speeding Up Spark Performance using Alluxio at China Unicom
PDF
Storrs HPC Overview - Feb. 2017
PDF
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
PDF
How to Develop and Operate Cloud First Data Platforms
PPTX
2.introduction to hdfs
PDF
Introducing the Hub for Data Orchestration
PDF
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
HDF-EOS Development: Current Status and Tools
Achieving Separation of Compute and Storage in a Cloud World
RaptorX: Building a 10X Faster Presto with hierarchical cache
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
From limited Hadoop compute capacity to increased data scientist efficiency
Fluid: When Alluxio Meets Kubernetes
Scalable and High available Distributed File System Metadata Service Using gR...
Sector Cloudcom Tutorial
Nov 2010 HUG: Fuzzy Table - B.A.H
[KCC oral] 정준영
Unit 2.pptx
Speeding Up Spark Performance using Alluxio at China Unicom
Storrs HPC Overview - Feb. 2017
Lessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by T...
How to Develop and Operate Cloud First Data Platforms
2.introduction to hdfs
Introducing the Hub for Data Orchestration
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
Ad

Similar to Subsetting at UAH (20)

PDF
HDF-EOS Subsetting: HEW and other tools
PPT
HDF-EOS Maintenance, Current Development and Tools
PPT
Metadata Requirements for EOSDIS Data Providers
PPT
HDF-EOS APIs, tools, etc.
PPT
HDF-EOS 3.0 Functional and Structural Design
PPT
HDF-EOS Status and Developments
PPT
Hdf eos status-workshp_xi_nov_2007
PPTX
Unit-3.pptx
PDF
optimizing_ceph_flash
PDF
HDF-EOS Development Current Status
PPT
Status of HDF-EOS, Related Software, and Tools
PDF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
PPTX
ContainerCon EU 2016 - Software-Defined Storage and Container Schedulers
PPTX
Big data processing using hadoop poster presentation
PDF
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
PPTX
DAOS Middleware overview
PPT
Overview of the Data Processing Error Analysis System (DPEAS)
PPTX
Using SAS GRID v 9 with Isilon F810
PPTX
Desktop as a Service supporting Environmental ‘omics
PPTX
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
HDF-EOS Subsetting: HEW and other tools
HDF-EOS Maintenance, Current Development and Tools
Metadata Requirements for EOSDIS Data Providers
HDF-EOS APIs, tools, etc.
HDF-EOS 3.0 Functional and Structural Design
HDF-EOS Status and Developments
Hdf eos status-workshp_xi_nov_2007
Unit-3.pptx
optimizing_ceph_flash
HDF-EOS Development Current Status
Status of HDF-EOS, Related Software, and Tools
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
ContainerCon EU 2016 - Software-Defined Storage and Container Schedulers
Big data processing using hadoop poster presentation
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DAOS Middleware overview
Overview of the Data Processing Error Analysis System (DPEAS)
Using SAS GRID v 9 with Isilon F810
Desktop as a Service supporting Environmental ‘omics
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Ad

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10

Recently uploaded (20)

PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Encapsulation theory and applications.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
TLE Review Electricity (Electricity).pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
A Presentation on Touch Screen Technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
1 - Historical Antecedents, Social Consideration.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Encapsulation theory and applications.pdf
A comparative study of natural language inference in Swahili using monolingua...
cloud_computing_Infrastucture_as_cloud_p
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
A Presentation on Artificial Intelligence
TLE Review Electricity (Electricity).pptx
Hybrid model detection and classification of lung cancer
Encapsulation_ Review paper, used for researhc scholars
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Zenith AI: Advanced Artificial Intelligence
Approach and Philosophy of On baking technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
A Presentation on Touch Screen Technology
NewMind AI Weekly Chronicles - August'25-Week II
1 - Historical Antecedents, Social Consideration.pdf

Subsetting at UAH

  • 1. Subsetting at UAH Bruce Beaumont, Matt Smith, Helen Conover, Sara Graves HDF & HDF-EOS Workshop VIII 2004 October 26-28 Aurora, CO
  • 2. Why Subset? • Goal: to provide a science data user with only the data they need as quickly as possible. • Benefits science data users and data centers: - Reduces analysis time by reducing amount of data - Reduces time for data delivery - Reduces resources (network, personnel, media, etc.) • Steps: - Locate spatial / temporal / spectral area of interest - Extract - Re-assemble for distribution/use
  • 3. What is HSE? HEW Subsetting Engine A new packaging option for the HDFEOS Subsetter designed for users that want subsetting, but do not want to host the full HEW installation or even the standalone HEW back-end.
  • 4. What is HEW? • HDF-EOS Web-based Subsetter – Prototype software designed to be datasetindependent (HDF-EOS) – Funded by NASA/ESDIS for EOS-DIS – To be used within ECS (EOS-DIS Core System) – Original Front-end/GUI (optional) • Uses HTML forms and JavaScript – Original Back-end • Needed subset criteria and HDF-EOS data • Performed subsetting as a “batch” job
  • 5. What are HSE’s capabilities? • Versions available for HDF-EOS 2 and HDF-EOS 5. • Subsets multiple files in one call. • Subsets properly-formatted HDF-EOS grid and swath objects. • Subsets spatially by latitude/longitude or row/column. • Subsets swaths temporally by date/time range. • Subsets swaths by full or partial (subscan) lines. • Subsets spectrally by HDF-EOS field. • Subsamples along any dimension by stride (repeating interval) or by discrete index list. (see chart) • Copies all file and field attributes to the output file and updates HDF-EOS “core” metadata when possible.
  • 6. Stride Subsampling on a Dimension •STRIDE = 2 0 1 2 3 4 5 6 7 8 9 7 8 9 Selected Indexed Subsampling on a Dimension •INDEXES = (1, 3, 4, 5, 7) 0 1 2 3 4 5 Selected 6
  • 7. Existing HEW Back-End Architecture Subset t ing crit eria ( ODL) St at us file End -user E-mail HEW_ DISS Input file( s) Out put file( s) Log file • Front end supplies subsetting criteria file • Subsetter writes messages to status file • Subsetter writes messages to log file • Subsetter sends e-mail to end-user Many sites do not want to create ODL files or the hassle of dealing with message or log files or e-mail.
  • 8. New HSE Architecture All subsetting functionality is contained within a callable function • User Applicat ion Code Subset t ing crit eria Ret urn code • Input file( s) HEW Subset t ing Engine funct ion HSE_ LogMsg callback HSE_ St at usMsg callback Out put file( s) • User application code builds subsetting criteria structure Subsetting engine function calls user’s functions for status and log messages No e-mail is sent
  • 9. SPOT • Subsettability “checker” – Displays content/structure of HDF-EOS4 (or HE5) files – Examines files for subsettability by HSE – Simple command-line interface – Stand-alone operation – Available at subset.org
  • 10. HEW integration with ECS EDG System EDG Order submission (HTML) End user ECS 2 ECS 1 7 3 Output data (Reingested) 4 Data order and reply Subset ODL and reply Output data 6 Subsetter Subsetting System 5 Input data
  • 11. Product Availability Matrix HDF-EOS 2 (HDF4) HDF-EOS 5 (HDF5) SGI • Sun • Linux N/A N/A SGI • Sun • Linux HEWBE SGI • Sun • Linux Planned HEW SGI • Sun • Linux N/A Product HSE HSE-5 SPOT SGI • Sun • Linux
  • 12. Currently Available/Planned Subsetting Applications • HDF-EOS Subsetting Engine (HDF-EOS, HE5)Status Deployments – – – – – – • Complete System Subsetting Engine Only SPOT - Subsettability Checker HSE Integrated with ECS Data Order System HSE Integrated with AMSR-E Processing Subsetting as a Web Service GHRC GSFC many NSIDC, EDC AMSR-E SIPS (ECHO) available available Science teams GHRC in work (various) Customized Subsetting – MODIS tools – Coarse-grain SSM/I Subsetter • available available available available available planned General Purpose Customizable Subsetting – Subsetting Tool using ESML