SlideShare a Scribd company logo
Aashish Chaudhary
aashish.chaudhary@kitware.com
Technical Leader
with
Patrick O’Leary,
Dr. Rama Nemani (NASA),
Chris Harris,
Chris Kotfila, Doruk Aztek,
Andrew Michaelis (NASA)
Open-source Scientific
Computing and Data Analytics
using HDF
July 24th 2017
ESIP Summer
What We Do
at Kitware?
Open Source
and Open
Data is
strongly
encouraged
and practiced
at Kitware
It started with VTK
Parallel Processing and Rendering - Paraview
Computer Vision
Images,
Video,
Point
Clouds
Recognition
by Function
Content-
based
Retrieval
Event &
Activity
Recognition
Anomaly
Detection
3D Extraction
and
Compression
Detection
& Tracking
Medical Computing
Quantitative imaging Electronic health records
Vascular analysis
Surgical guidance
And simulation
Digital pathology Orthopedic analysis
Longitudinal and
population shape
analysis
Interactive medical applications
and visualizations
Community Adaptation
HDF at Kitware
Climate Community High Performance Computing
Extensible Data Model and Format
- Developed to exchange
scientific data between HPC
codes and tools
- Heavy data is stored using
HDF5
Network Common
Data Form
(NetCDF)
- Most projects
use NetCDF4
Medical Community Vision Community
Leading-edge
algorithms for
registering and
segmenting
multidimensional data
ACME
The Accelerated Climate Modeling for Energy
(ACME) project is sponsored by the Earth System
Modeling (ESM) program (Biological and
Environmental Research) with eight national
laboratories and six partner institutions to develop
and apply the most complete, leading-edge climate
and Earth system models to challenging and
demanding climate-change research imperatives.
Most commonly used data format - NetCDF4
Data streaming using OpenDAP
Python Interface for most of the tools
OpenNEX
NEX is a platform for scientific
collaboration, knowledge sharing and
research for the Earth science
community
Global Daily Downscaled Projections (NEX-
GDDP, NetCDF4)
MODIS-Land and Atmosphere (HDF)
Web VisualizationData processing
Gaia
Gaia
Web VisualizationData processing
Pure JS?
HDF5 File Organization
Preprocessing Simulation Postprocessing
Open-source Scientific Computing and Data Analytics using HDF
Possible Improvements
Streaming and Big Data analytics
- Any useful ingestion of HDF data
into cluster requires ETL pipeline
- For some tools, computation cannot
move close to the data, streaming
support is necessary in such cases
- Optimal read/write on cloud storage
Web-Support
- More tools and projects are moving
to support web-enabled data
analysis and visualization
- Pure JS implementation if possible
Summary
● HDF is widely data format for scientific computing, climate/geospatial
visualization, and in other domains at Kitware
● Recently we have started using HDF for information visualization
● We are looking forward to HDF usage on cloud and web-environment
● Kitware is always looking for strong open source collaborations and is
committed to push open-source scientific computing to its next level
Information
Aashish Chaudhary: aashish.chaudhary@kitware.com
LinkedIn: www.linkedin.com/in/aachaudhary
Kitware: http://www.kitware.com
NASA-NEX: https://nex.nasa.gov/nex
Kitware-AIST: https://github.com/OpenGeoscience/nex
HPC Cloud : http://www.kitware.com/publications/item/view/1784
HPCloud Github: https://github.com/Kitware/HPCCloud

More Related Content

PPTX
Hierarchical Data Formats (HDF) Update
PPTX
HDF Product Designer: Using Templates to Achieve Interoperability
PPTX
Moving form HDF4 to HDF5/netCDF-4
PPTX
Scientific Computing and Visualization using HDF
PPTX
HDF Update for DAAC Managers (2017-02-27)
PPTX
Improved Methods for Accessing Scientific Data for the Masses
PPTX
Product Designer Hub - Taking HPD to the Web
Hierarchical Data Formats (HDF) Update
HDF Product Designer: Using Templates to Achieve Interoperability
Moving form HDF4 to HDF5/netCDF-4
Scientific Computing and Visualization using HDF
HDF Update for DAAC Managers (2017-02-27)
Improved Methods for Accessing Scientific Data for the Masses
Product Designer Hub - Taking HPD to the Web

What's hot (20)

PDF
HDFCloud Workshop: HDF5 in the Cloud
PPTX
Efficiently serving HDF5 via OPeNDAP
PPTX
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
PPSX
GDAL Enhancement for ESDIS Project
PPTX
Incorporating ISO Metadata Using HDF Product Designer
PPTX
MATLAB and Scientific Data: New Features and Capabilities
PPT
PPTX
Utilizing HDF4 File Content Maps for the Cloud Computing
PPTX
Data Analytics using MATLAB and HDF5
PPT
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
PPTX
Multidimensional Scientific Data in ArcGIS
PPTX
Matlab, Big Data, and HDF Server
PPTX
Putting some Spark into HDF5
PPTX
PPT
HDF-EOS 2/5 to netCDF Converter
PPSX
Data Are from Mars, Tools Are from Venus
HDFCloud Workshop: HDF5 in the Cloud
Efficiently serving HDF5 via OPeNDAP
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
GDAL Enhancement for ESDIS Project
Incorporating ISO Metadata Using HDF Product Designer
MATLAB and Scientific Data: New Features and Capabilities
Utilizing HDF4 File Content Maps for the Cloud Computing
Data Analytics using MATLAB and HDF5
HDF5 Performance Enhancements with the Elimination of Unlimited Dimension
Multidimensional Scientific Data in ArcGIS
Matlab, Big Data, and HDF Server
Putting some Spark into HDF5
HDF-EOS 2/5 to netCDF Converter
Data Are from Mars, Tools Are from Venus
Ad

Similar to Open-source Scientific Computing and Data Analytics using HDF (20)

PDF
Cloud and Bid data Dr.VK.pdf
PDF
Big Data, Beyond the Data Center
PPTX
GLENNA: The Nordic cloud
PPTX
The Extreme Data Cloud (XDC) Project
PPTX
Data-intensive bioinformatics on HPC and Cloud
PPTX
Deep Hybrid DataCloud
PPTX
PPTX
Sycamore Quantum Computer 2019 developed.pptx
PPTX
Cloud Computing & Big Data
PPSX
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
PPTX
Fighting COVID-19 with Artificial Intelligence
 
DOCX
Worldranking universities final documentation
PPTX
NIH Data Summit - The NIH Data Commons
PDF
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
PPTX
Big Data/Hadoop Option Analysis
PPT
SomeSlides
PPTX
Building COVID-19 Museum as Open Science Project
 
PPTX
Data-intensive applications on cloud computing resources: Applications in lif...
PPTX
Session 33 - Production Grids
PDF
Data analytics and downscaling for climate research in a big data world
Cloud and Bid data Dr.VK.pdf
Big Data, Beyond the Data Center
GLENNA: The Nordic cloud
The Extreme Data Cloud (XDC) Project
Data-intensive bioinformatics on HPC and Cloud
Deep Hybrid DataCloud
Sycamore Quantum Computer 2019 developed.pptx
Cloud Computing & Big Data
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Fighting COVID-19 with Artificial Intelligence
 
Worldranking universities final documentation
NIH Data Summit - The NIH Data Commons
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Big Data/Hadoop Option Analysis
SomeSlides
Building COVID-19 Museum as Open Science Project
 
Data-intensive applications on cloud computing resources: Applications in lif...
Session 33 - Production Grids
Data analytics and downscaling for climate research in a big data world
Ad

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Accessing HDF5 data in the cloud with HSDS
PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
HDF - Current status and Future Directions
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
MATLAB Modernization on HDF5 1.10
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Accessing HDF5 data in the cloud with HSDS
Highly Scalable Data Service (HSDS) Performance Features
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
HDFEOS.org User Analsys, Updates, and Future
HDF - Current status and Future Directions
H5Coro: The Cloud-Optimized Read-Only Library
MATLAB Modernization on HDF5 1.10

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
“AI and Expert System Decision Support & Business Intelligence Systems”
20250228 LYD VKU AI Blended-Learning.pptx
Approach and Philosophy of On baking technology
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Building Integrated photovoltaic BIPV_UPV.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Chapter 3 Spatial Domain Image Processing.pdf
Cloud computing and distributed systems.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton

Open-source Scientific Computing and Data Analytics using HDF

  • 1. Aashish Chaudhary aashish.chaudhary@kitware.com Technical Leader with Patrick O’Leary, Dr. Rama Nemani (NASA), Chris Harris, Chris Kotfila, Doruk Aztek, Andrew Michaelis (NASA) Open-source Scientific Computing and Data Analytics using HDF July 24th 2017 ESIP Summer
  • 2. What We Do at Kitware? Open Source and Open Data is strongly encouraged and practiced at Kitware
  • 4. Parallel Processing and Rendering - Paraview
  • 5. Computer Vision Images, Video, Point Clouds Recognition by Function Content- based Retrieval Event & Activity Recognition Anomaly Detection 3D Extraction and Compression Detection & Tracking
  • 6. Medical Computing Quantitative imaging Electronic health records Vascular analysis Surgical guidance And simulation Digital pathology Orthopedic analysis Longitudinal and population shape analysis Interactive medical applications and visualizations
  • 8. HDF at Kitware Climate Community High Performance Computing Extensible Data Model and Format - Developed to exchange scientific data between HPC codes and tools - Heavy data is stored using HDF5 Network Common Data Form (NetCDF) - Most projects use NetCDF4 Medical Community Vision Community Leading-edge algorithms for registering and segmenting multidimensional data
  • 9. ACME The Accelerated Climate Modeling for Energy (ACME) project is sponsored by the Earth System Modeling (ESM) program (Biological and Environmental Research) with eight national laboratories and six partner institutions to develop and apply the most complete, leading-edge climate and Earth system models to challenging and demanding climate-change research imperatives. Most commonly used data format - NetCDF4 Data streaming using OpenDAP Python Interface for most of the tools
  • 10. OpenNEX NEX is a platform for scientific collaboration, knowledge sharing and research for the Earth science community Global Daily Downscaled Projections (NEX- GDDP, NetCDF4) MODIS-Land and Atmosphere (HDF)
  • 16. Possible Improvements Streaming and Big Data analytics - Any useful ingestion of HDF data into cluster requires ETL pipeline - For some tools, computation cannot move close to the data, streaming support is necessary in such cases - Optimal read/write on cloud storage Web-Support - More tools and projects are moving to support web-enabled data analysis and visualization - Pure JS implementation if possible
  • 17. Summary ● HDF is widely data format for scientific computing, climate/geospatial visualization, and in other domains at Kitware ● Recently we have started using HDF for information visualization ● We are looking forward to HDF usage on cloud and web-environment ● Kitware is always looking for strong open source collaborations and is committed to push open-source scientific computing to its next level
  • 18. Information Aashish Chaudhary: aashish.chaudhary@kitware.com LinkedIn: www.linkedin.com/in/aachaudhary Kitware: http://www.kitware.com NASA-NEX: https://nex.nasa.gov/nex Kitware-AIST: https://github.com/OpenGeoscience/nex HPC Cloud : http://www.kitware.com/publications/item/view/1784 HPCloud Github: https://github.com/Kitware/HPCCloud