How OPeNDAP has transformed the way we do science plus snapshots of
recent developments and BoM‟s operational systems built on this technology
Tim Pugh
SPEDDEXES workshop
17-21 March 2014
Evolution
• Traditionally…
– Scientific research is conducted in a quiet room in isolation
utilising unique data, scripts, and code
– Scientific collaboration is conducted at conferences with file
sharing by FTP or HTTP bulk download
• Today
– Scientific research is being driven to shared research services
and supported infrastructure
• To relieve the scientist of laborious developments
• To manage more complex machinery
• To improve scientific integrity and collaboration
• To work within managed and supported infrastructure
– Science is moving from file sharing to data sharing collaboration
CAWCR Research Data
Server
• Location: http://opendap.bom.gov.au:8080/thredds
• Unidata THREDDS Data Server v4.2.8
• http://www.unidata.ucar.edu/projects/THREDDS/tech/TDS.html
• The THREDDS Data Server (TDS) is a JavaSevlet, and is contained in a single
war file, which allows very easy installation into Tomcat web server.
OPeNDAP Now Is:
• An acronym
– “Open-source Project for a Network Data Access
Protocol”
– Often a synonym for “DAP”
• A not-for-profit corp. developing/supporting
– “DAPx” - a web-services protocol for data access
• Deployed by hundreds of data providers internationally
• Employed in many analysis packages (MATLAB, e.g.)
• Designated a “Community Standard” by NASA
– Server & client implementations* of DAP
*Note: there are other implementations
BROAD VISION
1. A world in which a single data access protocol is used for the
exchange of data between network-based applications regardless
of discipline.
2. A layer above TCP/IP providing for syntactic and semantic
consistency not available in existing protocols such as FTP.
Fundamental Objective
of OPENDAP
• The fundamental objective of OPeNDAP and OPeNDAP Inc. is to
facilitate internet access to scientific data
• This is done by:
• Providing a protocol (DAP) to access data over the internet,
• Hiding the format (and organization) in which the data are stored from
the user, and
• Providing subsetting (and other) capabilities for the data at the server
• OPeNDAP is based on a multi-tier architecture
• OPeNDAP software is open source
OPeNDAP Data-Type
Philosophy
the OPeNDAP data model has few data types
simplified programming/lowered risk of errors
they are intentionally discipline-neutral
better trans-domain utility & programmer uptake
they nonetheless fill discipline-specific needs
netCDF-like (good in contexts where, e.g., data might
represent functions with 4- or 5-D domains)
sequences & selections match dbms sensibilities
TDS Server
• TDS is THREDDS Data Server
– THREDDS is Thematic Real-time Environmental Distributed Data Services
– Middleware to bridge the gap between data providers and data users
– THREDDS Data Server (TDS), a web server that provides catalog, metadata,
and data access services for scientific datasets.
– The TDS is open source, 100% Java, and runs inside the open source Tomcat
Servlet container.
• Unidata‟s Common Data Model
– merges the OPeNDAP, netCDF, and HDF5 data models to create a common API
for scientific data
– implemented by the NetCDF Java library
– read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3,
GEMPAK, MCIDAS, GINI, among others
– A pluggable framework allows other developers to add readers for their own
specialized formats.
– provides standard APIs for geo-referencing coordinate systems, and specialized
queries for scientific feature types like Grid, Point, and Radial datasets
Some of the Technology
in the TDS
1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and
associated metadata.
2. The Netcdf-Java/CDM library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other
binary formats such as GRIB and NEXRAD, essentially an (extended) netCDF view of the data.
3. TDS can use the NetCDF Markup Language (NcML) to modify and create virtual aggregations of
datasets.
4. An integrated server provides OPeNDAP access with subsetting data access method.
5. An integrated server provides bulk file access through the HTTP protocol.
6. An integrated server provides data access through the OpenGIS Consortium (OGC) Web
Coverage Service (WCS) protocol, for any "gridded" dataset whose coordinate system
information is complete.
7. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Map
Service (WMS) protocol, for any "gridded" dataset whose coordinate system information is
complete.
8. The integrated ncISO server provides automated metadata analysis and ISO metadata
generation.
THREDDS Catalog
• The goal is…
– to simplify the discovery and use of scientific data and to allow scientific
publications and educational materials to reference scientific data.
– initial focus was to allow data users to find datasets that are pertinent to their
specific education and research needs, access the data, and use them without
necessarily downloading the entire file to their local system.
– Catalogs are the heart of the data access services, and is the THREDDS
concept. Catalogs consist of XML documents that describe on-line datasets.
– Catalogs can contain arbitrary metadata, however we also defined a standard set
of metadata to bridge to discovery centers
• CF (Climate & Forecast) and Unidata Data Discovery metadata
Spectrum of
Use Cases
Application Data
Representation
OGC data model
domain specific
geospatial, 1-D, 2-D
DAP2 data model
domain neutral
n-D, time series
**DAP4 data model
domain neutral
new data types and data
structures
streaming, compressed,
chunked
Common Data Model (CDM)
domain specific
Future data model
domain neutral??
Application Types
Programmatic / Langauge
API
FORTRAN, C/C++, JAVA,
Python, NetCDF, Java NetCDF
Programmatic / Tools
NetCDF, NCO, PyDAP
Custom Tools: OPeNDAP
crawler, ocean_prep
Interactive Data Viewer
IDV, Panolopy, IDL, MATLAB,
iPython (matplotlib), NCL, web
browser (metadata)
Interactive Analysis
MATLAB, IDL, iPython, NCL
Custom Application: Inudation
Modeller
Web Application
Live Access Server
IMOS Data Portal (WMS)
Custom Java Servlet
Programming
DAP2 Legacy Code
existing tools
DAP2 New Code
New tools
**DAP4 programming
legacy code support
**DAP4 programming
new data model and protocols
streaming support
**DAP4 programming
Asynchronous access modes,
server-side processing
Data Access
Protocol
Metadata Request
das, dds, ddx
ASCII/Binary Data Request
Simple data representation
DAP Binary Object Request NcML Data Request
aggregation, virtual data sets
**DAP4
server-side operations, async
access mode, new data model,
posting
Syntax
Return data set info
file.nc.dds - readable
file.nc.ddx - XML
file.nc.asc - ASCII data return
Select variables
file.nc.dods?var1,var2,var3
subset arrays
file.dods?var1(0:1:10)
Return file translations
file.nc.netcdf - NetCDF file
Server-side operations
file.nc?GEOLOC()
Async access mode
??
Clients
Programmatic Access
Tsunami inudation modeller,
NetCDF,
NCO, PyDAP, PyNetCDF,
MATLAB, IDL, …
Interactive Access
Web browser - Catalog
MATLAB, IDL, Python,
Panolopy,…
Data Library & Catalog
Service
metadata harvesting
directory listings
remote THREDDS services
Web Service
Java servlet, Java applet
Geospatial Information Service
OPeNDAP data service
Analysis Service
Live Access Server
Service Capabilities
DAP2 response
metadata, dods, ASCII / Binary
**DAP4 Response
async access mode, server-
side, streaming,
NcML
Aggregation service
Virtual Data Set Service
Remote Data Access
Metadata Conversion and
RDF
metadata definitions,
translations (-> ISO) sematics,
ontalogy
CF->ISO, CF->WMS, CF->WCS
Layered Services
Catalogue service
WMS, WCS services
Authentication
Conformance checks
CF metadata check
ISO metadata check
**DAP4 features listed is my estimation and not the official specification
Use Case limitations
• Time to access data is dependent on the following
factors:
• Hardware and network performance
• Selection of variables and dimensions
• Number of data requests to be issued
− Latency inherent in the data request
• Number of concurrent accesses to the server
DAP-enabled client
tools/applications
OPeNDAP Clients (partial list) http://opendap.org/whatClients
1. Web browser returning ASCII data
2. Pydap - is a pure Python library implementation of the DAP2
3. NetCDF - is a set of software libraries and self-describing, machine-
independent data formats with interfaces to Python, FORTRAN, C/C++,
and Java languages
4. NCO – comprises a dozen standalone, command-line programs that take
netCDF files as input
5. MATLAB – a high-level technical computing language and interactive
environment for algorithm development, data visualization, data analysis,
and numerical computation
6. Panoply – Panoply is a cross-platform application which plots geo-gridded
arrays from netCDF, HDF and GRIB datasets.
Developments by
Bureau and CSIRO
• Development of web portals for data access services and
information systems in climate and environment
– Seasonal Climate Outlook Rebuild (Roald de Wit)
– Natural Resource Management (NRM) Climate Change Portal (Tim
Erwin)
– eReef‟s Marine Quality Dashboard and data services (Jonathon Hodge)
– National Environmental Information Infrastructure (NEII) (Andrew Woolf)
– CAWCR research data services (Duan Beckett)
• Establish Climate Data Publishing services at NCI
– NCI, CSIRO, Bureau of Meteorology, CoE CSS
– Earth System Grid (ESG)
– Climate and Weather Science Laboratory (CWSLab)
SCO-R Project
overview
Project overview
• More interactivity and functionality needed
• Demand for POAMA multi-week forecast products
• Long term view of seamless transition
between forecasts
• Building upon experiences /
technologies from other BoM
projects
(e.g.MetEye and PASAP/PACCSAP)
SCO-R architecture
MapCache
BOM.Map / BOM.App
Custom WMS Service
(Python)
Climate Futures
Climate Futures approach to the provision of regional climate projection information
CMAR/CLIMATE ADAPTATION FLAGSHIP
Tim Erwin
Acknowledgements: Penny Whetton, Kevin Hennessy, John Clarke, David Kent
28 October 2013
Climate Data
• Processed from climate model data (CMIP3 and CMIP5)
• NetCDF file format
• 10 variables (temperature, rainfall, humidity...)
• 20 year seasonal averages (2030, 2035, ..., 2090)
• Base period (1950 – 2005) stored as monthly time span
• Catalogued in THREDDS server
– Allows DAP access
• Django
• THREDDS catalogues are parsed and stored
– model, variable, dap url, layer name, time span
Architecture
Architecture
THREDDS
ZOO
ZOO-Project
(WPS Server)
• Consists of: Kernel, API, Service
• Works with Apache through a cgi file and a conf file
• Support several common programming languages C/C++,
Fortran, Python, PHP, Perl, Java, JavaScript
• Used to create area average of gridded data using non-
rectangular mask
• Predefined mask
• Polygon (GML,KML,GEOM)
• Not limited to geographic operations
OPeNDAP Technology
Developments
• DAP4 protocol and data model implementation (OPULS)
– OPULS (an OPeNDAP-Unidata collaboration)
– DAP4 (to supersede DAP2)
– Experimental extensions (Async access, UGRID subsets)
• DAP2 & DAP4 JSON response type
– Improve javascript client utilisation of DAP services
• ncWMS integration and WMS extensions
– contour map types
– THREDDS and Hyrax integration of ncWMS
• Programmatic Data Access for secure services
– RDSI DaSh project to support programmatic data access
– Integration within reX Identity and Authorisation Management
DAP4 Experiments
• DAP4 provides more complete support for functions
including metadata responses (DAP2 does not provide
this; a gap in the DAP2 specification)
– Experiments with Unstructured Grid (irregular mesh) subsetting
– Binning: returns a distribution (as a raster of boolean values on a
user-specified grid) of data values satisfying some criteria
– Masking: accepts a raster of zero/nonzero values as a query
argument, perhaps as a geospatial selection criterion
• OPeNDAP are running several experimental mini-
projects within its context:
– Asynchronous access, data streaming, cloud computing and an
expanded, function-based, server-side processing system
thank you – have a great experience
Tim F. Pugh
HPC and CWSLab Project Lead
Melbourne, Victoria, Australia
Email: t.pugh@bom.gov.au
Office: +61 3 9669 4345
Workshop
Use-Cases
Application Data
Representation
DAP2 data model
domain neutral
n-D, time series
Application Types
Programmatic / Langauge
API
FORTRAN, C/C++, JAVA,
Python, NetCDF, Java Netcdf,
PyDAP
Programmatic / Tools
NetCDF, NCO, PyDAP
Custom Tools: OPeNDAP
crawler
Interactive Data Viewer
Panolopy, MATLAB, NCL, web
browser
Programming
DAP2 Legacy Code
existing tools:
DAP2 New Code
New tools
Data Access
Protocol
Metadata Request
das, dds, ddx
ASCII/Binary Data Request
Simple data representation
DAP Binary Object Request NcML Data Request
aggregation
Syntax
Return metadata info
file.nc.das - readable
file.nc.dds - readable
file.nc.ddx - XML metadata
file.nc.help - help info
Select vars and return data
file.nc.asc?var1,var2,var3
file.nc.dods?var1,var2,var3
subset arrays, return data
file.asc?var1(0:1:10)
file.dods?var1(0:1:10)
Return file translations
file.nc.netcdf - NetCDF file
Server-side operations
file.nc?GEOLOC()
Clients
Programmatic Access
NetCDF, NCO, PyDAP,
PyNetCDF
Interactive Access
Web browser - Catalog
Python, MATLAB, Panolopy
Service Capabilities
DAP2 response
THREDDS data service
Hyrax data service
NcML
Aggregation service
Layered Services
Catalog service
WMS
Pydap client
• >>> from pydap.client import open_url
• >>> dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc')
• >>> var = dataset['SST']
• >>> var.shape
• (12, 90, 180)
• >>> var.type
• <class 'pydap.model.Float32'>
• >>> print var[0,10:14,10:14] # this will download data from the server
• <class 'pydap.model.GridType'>
• with data
• [[ -1.26285708e+00 -9.99999979e+33 -9.99999979e+33 -9.99999979e+33]
• [ -7.69166648e-01 -7.79999971e-01 -6.75454497e-01 -5.95714271e-01]
• [ 1.28333330e-01 -5.00000156e-02 -6.36363626e-02 -1.41666666e-01]
• [ 6.38000011e-01 8.95384610e-01 7.21666634e-01 8.10000002e-01]]
• and axes
• 366.0
• [-69. -67. -65. -63.]
• [ 41. 43. 45. 47.]
NetCDF client
• >>> import netCDF4
• >>> url = 'http://test.opendap.org/dap/data/nc/coads_climatology.nc‟
• >>> dataset = netCDF4.Dataset(url)
• >>> var = dataset.variables['SST']
• >>> var.shape
• (12, 90, 180)
• >>> print var[0,10:14,10:14] # this will download data from the server
• <class 'pydap.model.GridType'>
• with data
• [[-1.26285707951 -- -- --]
• [-0.769166648388 -0.77999997139 -0.675454497337 -0.595714271069]
• [0.128333330154 -0.0500000156462 -0.0636363625526 -0.141666665673]
• [0.638000011444 0.895384609699 0.721666634083 0.810000002384]]
• >>> print var
• <type 'netCDF4.Variable'>
• float32 SST('TIME', 'COADSY', 'COADSX')
• …
MATLAB and SNCtools
• % ex_snctools_opendap.m
• % Read from a remote OPeNDAP server with the same file
• %
• ncRef =
'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc'
• nc_dump( ncRef );
• pause
• temp = nc_varget( ncRef, 'analysed_sst');
• lon = nc_varget( ncRef, 'lon');
• lat = nc_varget( ncRef, 'lat');
• imagesc(lat, lon, temp); axis xy
MATLAB and NJTbx
demo
• % ex_njtbx.m
• % Read from a remote OPeNDAP server with the same file
• %
• ncRef =
'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc'
• nj_info( ncRef )
• pause
• [temp, grid] = nj_grid_varget(ncRef,'analysed_sst');
• imagesc(grid.lon, grid.lat, temp); axis xy; colorbar
Tomcat/[Apache]
dodsC
fileServer
wms
ncss
THREDDS services
syntax
{contextPath} = “thredds” (servlet default name)
{service} = “fileServer” | “dodsC” | “wms” | “wcs”
• Bulk File Transfer
fileServer = HTTP Server (any file)
• Remote access, subsetting CDM files
dodsC = OPeNDAP (any CDM file)
wms = Web Map Server (grids)
wcs = Web Coverage Server (grids)
ncss = NetCDF Subset Service (grids)
admin = Administration/debug interface
Note, each server can change the service name in
the xml catalogue.
http://{server:port}/{contextPath}/{service}/...
wcs
Catalogs
thredds
Hyrax service syntax
Tomcat/[Apache]
opendap
hyrax
docs
{contextPath} = “opendap” (servlet default name)
{service} = “hyrax” | “admin” | “docs”
hyrax = catalog interface
admin = administration interface (v1.8+)
docs = documentation (v1.8+)
Note, each server can change the service name
within the server configuration file.
http://{server:port}/{contextPath}/{service}/…
http://test.opendap.org/opendap/hyrax/...e.g.
admin
Hyrax Data Service
• DAP2 and DAP3.x as the protocol develops
• Other dataset responses*
• ASCII & NetCDF renderings of data (not limited to
data natively stored in netCDF)
• RDF
• ISO 19115 and the conformance rubric (Hyrax 1.8)
• Other server responses**
• THREDDS catalogs
Tomcat/[Apache]
Hyrax
DAP2
RDF*
Catalogs**
DAP3.x
Note: Hyrax and TDS are not mutually excusive;
Sites can install both with little extra effort.
Data Discovery and
Access
• Data discovery services
• NASA‟s Global Change Master Directory
− http://gcmd.nasa.gov
• IMOS eMII portal
− http://imosmest.aodn.org.au/geonetwork/srv/en/main.home
− Help --> http://emii1.its.utas.edu.au/drupal/?q=node/25
• TERN AusCover portal
− http://data.auscover.org.au/
• My Ocean portal
− http://www.myocean.eu/web/24-catalogue.php
• TPAC Digital Library
− http://dl.tpac.org.au
• Data access services
• Unidata‟s THREDDS Data Service
− http://www.unidata.ucar.edu/projects/THREDDS/
• OPeNDAP‟s Hyrax Data Service
− http://opendap.org/download/hyrax.html
• NOAA‟s ERDDAP Data Service
− http://coastwatch.pfeg.noaa.gov/erddap
Some of the Technology
in Hyrax
1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and
associated metadata.
2. Supports many formats and data stores: netCDF3, netCDF4, HDF4, HDF5, FreeForm, SQL data
bases
3. Uses a plug-in based architecture and includes tools to write custom handlers
4. NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets.
5. OPeNDAP access with subsetting data access method.
6. bulk file access through the HTTP protocol.
7. ncISO server provides automated metadata analysis and ISO metadata generation.
8. RDF output - Metadata as triples; used with web-based reasoning systems
9. Code that has passed a formal security audit
10. A true multi-system architecture that can fit in a variety of enterprise settings
11. An administrator‟s interface
DAP Responses
• DAP2 defines three response types:
• DAS: A text document that contains data set attributes
• DDS: A text document that contains data set variable types and names
• DODS: A quasi-multipart MIME document that contains the DDS and
associated binary values for a data request
• DAP3.x defines two additional response types:
• DDX: An XML document that combines both variable type and name
information along with attributes
• DataDDX: A multipart MIME document that combines a DDX with the
associated binary values for a data request
TDS and Hyrax both support DAP2; Hyrax includes support for DAP3, TDS
has support for the DDX
Some Definitions
DAP = Data Access Protocol
 Model used to describe the data;
 Request syntax and semantics; and
 Response syntax and semantics.
 The data structure returned to the user
OPeNDAP
 The software that forms the service;
 Numerous implementations (Hyrax (reference), THREDDS,…);
 Core/libraries for client applications and services.
THREDDS / Hyrax
 A service framework (portal) that contains the OPeNDAP
service;
Decipher the URL
• http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111
06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[0:1:1439]
• Given the OPeNDAP data request above, decipher the URL.
− Request Protocol? http
− Host name:port? //opendap.bom.gov.au:8080/
− ContextPath? thredds/
− Service? dodsC/
− Unique path to data set? gamssa_4deg/2009/
− Data reference? 20111106-ABOM-L4LRfnd-GLOB-v01-fv01.nc
− Return type? ascii
− Return variables? ?lon
− Return variable indice range? [0:1:1439] --> [start:skip:end]
NcML
NetCDF Meta Language
NcML can provide two basic features:
• Augmenting/Modifying data sets with new
• Attributes
• Values
• Combining two or more data sets (i.e., files) in an aggregation
Three kinds of aggregation are supported:
• Tile files
• Join files along an existing axis
• Join files along a new axis
While very powerful, these aggregations are not applicable to every
data set made up of multiple files
DAP4 Summary
• DAP (DAP2 and DAP4) is based on datasets built of
variables that share the characteristics of programming
languages
• Constraints are used to subset data on the server
• DAP4 is a REST API
• DAP4 specifies „modern‟ web services
– While DAP2 was a data model only, DAP4 includes specification
of the web services
• DAP4 provides more complete support for functions

More Related Content

PPT
Status of HDF-EOS, Related Software and Tools
PPT
PPTX
RDF-Gen: Generating RDF from streaming and archival data
PPTX
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
PDF
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
PPT
HDF-EOS 2/5 to netCDF Converter
PPTX
HDF Update for DAAC Managers (2017-02-27)
Status of HDF-EOS, Related Software and Tools
RDF-Gen: Generating RDF from streaming and archival data
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
Apache hadoop: POSH Meetup Palo Alto, CA April 2014
HDF-EOS 2/5 to netCDF Converter
HDF Update for DAAC Managers (2017-02-27)

What's hot (20)

PPTX
Moving form HDF4 to HDF5/netCDF-4
PPTX
LDP4j: A framework for the development of interoperable read-write Linked Da...
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPTX
Scalding by Adform Research, Alex Gryzlov
PPTX
Utilizing HDF4 File Content Maps for the Cloud Computing
PPT
Taylor bosc2010
PPTX
Fiware - communicating with ROS robots using Fast RTPS
PDF
Basics of big data analytics hadoop
PPSX
Open Source Lambda Architecture for deep learning
PPT
Big data processing using HPCC Systems Above and Beyond Hadoop
PPTX
Data Analytics using MATLAB and HDF5
PPTX
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
PPTX
Hadoop project design and a usecase
PDF
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
PPT
PDF
Fast Data Analytics with Spark and Python
PDF
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
PPTX
Streamlined data sharing and analysis to accelerate cancer research
PDF
Big Data technology Landscape
PDF
BDSE 2015 Evaluation of Big Data Platforms with HiBench
Moving form HDF4 to HDF5/netCDF-4
LDP4j: A framework for the development of interoperable read-write Linked Da...
H5Coro: The Cloud-Optimized Read-Only Library
Scalding by Adform Research, Alex Gryzlov
Utilizing HDF4 File Content Maps for the Cloud Computing
Taylor bosc2010
Fiware - communicating with ROS robots using Fast RTPS
Basics of big data analytics hadoop
Open Source Lambda Architecture for deep learning
Big data processing using HPCC Systems Above and Beyond Hadoop
Data Analytics using MATLAB and HDF5
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Hadoop project design and a usecase
The Best of Both Worlds: Unlocking the Power of (big) Knowledge Graphs with S...
Fast Data Analytics with Spark and Python
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Streamlined data sharing and analysis to accelerate cancer research
Big Data technology Landscape
BDSE 2015 Evaluation of Big Data Platforms with HiBench
Ad

Viewers also liked (7)

PPTX
Risk assessment of Australian ecosystems. Dr Emma Burns. ACEAS Grand 2014
PPTX
Unifying principles for modelling, Brad Evans, ACEAS Grand 2014
PPTX
Australian seagrass habitats: condition and threats, James Udy, ACEAS Grand 2014
PPTX
Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...
PPTX
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
PPTX
Vast lands and variable data: patterns and processes of mammal decline. Chris...
PPTX
Areas de estimulacion temprana
Risk assessment of Australian ecosystems. Dr Emma Burns. ACEAS Grand 2014
Unifying principles for modelling, Brad Evans, ACEAS Grand 2014
Australian seagrass habitats: condition and threats, James Udy, ACEAS Grand 2014
Developing an Australian phenology monitoring network, Tim Brown, ACEAS Grand...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
Vast lands and variable data: patterns and processes of mammal decline. Chris...
Areas de estimulacion temprana
Ad

Similar to Tim Pugh-SPEDDEXES 2014 (20)

PPTX
Edward King SPEDDEXES 2014
PDF
First they have to find it: Getting Open Government Data Discovered and Used
PPTX
HDF and netCDF Data Support in ArcGIS
PPTX
Cni research data_oxford_horstmann_jefferies
PPT
Elag 2012 - Under the hood of 3TU.Datacentrum.
PDF
Dats nih-dccpc-kc7-april2018-prs-uoxf
PPTX
Overview
PPTX
How to use NCI's national repository of big spatial data collections
PDF
Tracking research data footprints - slides
PDF
Unidata's Approach to Community Broadening through Data and Technology Sharing
PPT
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
PPT
070726 Igarss07 Barcelona
PPT
2004-11-13 Supersite Relational Database Project: (Data Portal?)
PPT
Srds Pres011120
PDF
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
PDF
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
PDF
CCCA Data Centre - Dynamic Data Citation for NetCDF files
PPT
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
PDF
Introduction to DATS v2.2 - NIH May 2017
PPT
jamstec-rew.ppt
Edward King SPEDDEXES 2014
First they have to find it: Getting Open Government Data Discovered and Used
HDF and netCDF Data Support in ArcGIS
Cni research data_oxford_horstmann_jefferies
Elag 2012 - Under the hood of 3TU.Datacentrum.
Dats nih-dccpc-kc7-april2018-prs-uoxf
Overview
How to use NCI's national repository of big spatial data collections
Tracking research data footprints - slides
Unidata's Approach to Community Broadening through Data and Technology Sharing
Easily Serving and Accessing HDF-EOS2 Datasets Using DODS Technologies
070726 Igarss07 Barcelona
2004-11-13 Supersite Relational Database Project: (Data Portal?)
Srds Pres011120
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
CCCA Data Centre - Dynamic Data Citation for NetCDF files
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
Introduction to DATS v2.2 - NIH May 2017
jamstec-rew.ppt

More from aceas13tern (20)

PPTX
Ecosystem services and livelihood opportunities, Jeremy Russell-Smith, ACEAS ...
PPTX
Local to national, Dr Lee Belbin, ACEAS Grand 2014
PPTX
Interactive Games to Value and Manage Ecosystem Services. Prof. Bob Costanza....
PPTX
ACEAS rationale, organisation and status. A. Specht, ACEAS Grand 2014
PPTX
Extinction of Northern Quoll. Euan Ritchie ACEAS Grand 2014
PPTX
Avifaunal disarray Ralph MacNally ACEAS Grand 2014
PPTX
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
PPTX
Transformation of Australia’s vegetated landscapes. Richard Thackway ACEAS Gr...
PPTX
Drought-induced mortality. Pat Mitchell, ACEAS Grand 2014
PPTX
Australian seagrass habitats. Kathryn McMahon, ACEAS Grand 2014
PPTX
Animal telemetry, Ross Dwyer ACEAS Grand 2014
PPTX
Productivity and freshwater fish abundance. Jian Yen. ACEAS Grand 2014
PPTX
Indigenous bio cultural knowledge ACEAS Grand 2014 Locke and Clark
PPT
Adaptation pathways for aquatic plants. Patrick Driver ACEAS Grand 2014
PPTX
Genetic impacts and climate change Part B, ACEAS Grand, Vicki Thomson
PPTX
Genetic impacts and climate change Part A, ACEAS Grand, Vicki Thomson
PPTX
Genetic impacts and climate change Part C, ACEAS Grand, Vicki Thomson
PPTX
Aquatic connectivity - Prof. Brian Fry ACEAS Grand
PPTX
Assoc. Prof. Alison Specht ACEAS Grand 2014 "Synthesis Centres internationally"
PPTX
Dr MIchael Vardon, ABS, ACEAS 2014 "Synthesis in environmental accounting"
Ecosystem services and livelihood opportunities, Jeremy Russell-Smith, ACEAS ...
Local to national, Dr Lee Belbin, ACEAS Grand 2014
Interactive Games to Value and Manage Ecosystem Services. Prof. Bob Costanza....
ACEAS rationale, organisation and status. A. Specht, ACEAS Grand 2014
Extinction of Northern Quoll. Euan Ritchie ACEAS Grand 2014
Avifaunal disarray Ralph MacNally ACEAS Grand 2014
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Transformation of Australia’s vegetated landscapes. Richard Thackway ACEAS Gr...
Drought-induced mortality. Pat Mitchell, ACEAS Grand 2014
Australian seagrass habitats. Kathryn McMahon, ACEAS Grand 2014
Animal telemetry, Ross Dwyer ACEAS Grand 2014
Productivity and freshwater fish abundance. Jian Yen. ACEAS Grand 2014
Indigenous bio cultural knowledge ACEAS Grand 2014 Locke and Clark
Adaptation pathways for aquatic plants. Patrick Driver ACEAS Grand 2014
Genetic impacts and climate change Part B, ACEAS Grand, Vicki Thomson
Genetic impacts and climate change Part A, ACEAS Grand, Vicki Thomson
Genetic impacts and climate change Part C, ACEAS Grand, Vicki Thomson
Aquatic connectivity - Prof. Brian Fry ACEAS Grand
Assoc. Prof. Alison Specht ACEAS Grand 2014 "Synthesis Centres internationally"
Dr MIchael Vardon, ABS, ACEAS 2014 "Synthesis in environmental accounting"

Recently uploaded (20)

PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PPTX
Module on health assessment of CHN. pptx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
HVAC Specification 2024 according to central public works department
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Module on health assessment of CHN. pptx
B.Sc. DS Unit 2 Software Engineering.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
Introduction to pro and eukaryotes and differences.pptx
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
Virtual and Augmented Reality in Current Scenario
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Complications of Minimal Access-Surgery.pdf
Unit 4 Computer Architecture Multicore Processor.pptx
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
Core Concepts of Personalized Learning and Virtual Learning Environments
FORM 1 BIOLOGY MIND MAPS and their schemes
Paper A Mock Exam 9_ Attempt review.pdf.
HVAC Specification 2024 according to central public works department
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 2).pdf
Cambridge-Practice-Tests-for-IELTS-12.docx

Tim Pugh-SPEDDEXES 2014

  • 1. How OPeNDAP has transformed the way we do science plus snapshots of recent developments and BoM‟s operational systems built on this technology Tim Pugh SPEDDEXES workshop 17-21 March 2014
  • 2. Evolution • Traditionally… – Scientific research is conducted in a quiet room in isolation utilising unique data, scripts, and code – Scientific collaboration is conducted at conferences with file sharing by FTP or HTTP bulk download • Today – Scientific research is being driven to shared research services and supported infrastructure • To relieve the scientist of laborious developments • To manage more complex machinery • To improve scientific integrity and collaboration • To work within managed and supported infrastructure – Science is moving from file sharing to data sharing collaboration
  • 3. CAWCR Research Data Server • Location: http://opendap.bom.gov.au:8080/thredds • Unidata THREDDS Data Server v4.2.8 • http://www.unidata.ucar.edu/projects/THREDDS/tech/TDS.html • The THREDDS Data Server (TDS) is a JavaSevlet, and is contained in a single war file, which allows very easy installation into Tomcat web server.
  • 4. OPeNDAP Now Is: • An acronym – “Open-source Project for a Network Data Access Protocol” – Often a synonym for “DAP” • A not-for-profit corp. developing/supporting – “DAPx” - a web-services protocol for data access • Deployed by hundreds of data providers internationally • Employed in many analysis packages (MATLAB, e.g.) • Designated a “Community Standard” by NASA – Server & client implementations* of DAP *Note: there are other implementations
  • 5. BROAD VISION 1. A world in which a single data access protocol is used for the exchange of data between network-based applications regardless of discipline. 2. A layer above TCP/IP providing for syntactic and semantic consistency not available in existing protocols such as FTP.
  • 6. Fundamental Objective of OPENDAP • The fundamental objective of OPeNDAP and OPeNDAP Inc. is to facilitate internet access to scientific data • This is done by: • Providing a protocol (DAP) to access data over the internet, • Hiding the format (and organization) in which the data are stored from the user, and • Providing subsetting (and other) capabilities for the data at the server • OPeNDAP is based on a multi-tier architecture • OPeNDAP software is open source
  • 7. OPeNDAP Data-Type Philosophy the OPeNDAP data model has few data types simplified programming/lowered risk of errors they are intentionally discipline-neutral better trans-domain utility & programmer uptake they nonetheless fill discipline-specific needs netCDF-like (good in contexts where, e.g., data might represent functions with 4- or 5-D domains) sequences & selections match dbms sensibilities
  • 8. TDS Server • TDS is THREDDS Data Server – THREDDS is Thematic Real-time Environmental Distributed Data Services – Middleware to bridge the gap between data providers and data users – THREDDS Data Server (TDS), a web server that provides catalog, metadata, and data access services for scientific datasets. – The TDS is open source, 100% Java, and runs inside the open source Tomcat Servlet container. • Unidata‟s Common Data Model – merges the OPeNDAP, netCDF, and HDF5 data models to create a common API for scientific data – implemented by the NetCDF Java library – read netCDF, OPeNDAP, HDF5, HDF4, GRIB 1 & 2, BUFR, NEXRAD 2 & 3, GEMPAK, MCIDAS, GINI, among others – A pluggable framework allows other developers to add readers for their own specialized formats. – provides standard APIs for geo-referencing coordinate systems, and specialized queries for scientific feature types like Grid, Point, and Radial datasets
  • 9. Some of the Technology in the TDS 1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and associated metadata. 2. The Netcdf-Java/CDM library reads NetCDF, OpenDAP, and HDF5 datasets, as well as other binary formats such as GRIB and NEXRAD, essentially an (extended) netCDF view of the data. 3. TDS can use the NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets. 4. An integrated server provides OPeNDAP access with subsetting data access method. 5. An integrated server provides bulk file access through the HTTP protocol. 6. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Coverage Service (WCS) protocol, for any "gridded" dataset whose coordinate system information is complete. 7. An integrated server provides data access through the OpenGIS Consortium (OGC) Web Map Service (WMS) protocol, for any "gridded" dataset whose coordinate system information is complete. 8. The integrated ncISO server provides automated metadata analysis and ISO metadata generation.
  • 10. THREDDS Catalog • The goal is… – to simplify the discovery and use of scientific data and to allow scientific publications and educational materials to reference scientific data. – initial focus was to allow data users to find datasets that are pertinent to their specific education and research needs, access the data, and use them without necessarily downloading the entire file to their local system. – Catalogs are the heart of the data access services, and is the THREDDS concept. Catalogs consist of XML documents that describe on-line datasets. – Catalogs can contain arbitrary metadata, however we also defined a standard set of metadata to bridge to discovery centers • CF (Climate & Forecast) and Unidata Data Discovery metadata
  • 11. Spectrum of Use Cases Application Data Representation OGC data model domain specific geospatial, 1-D, 2-D DAP2 data model domain neutral n-D, time series **DAP4 data model domain neutral new data types and data structures streaming, compressed, chunked Common Data Model (CDM) domain specific Future data model domain neutral?? Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java NetCDF Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler, ocean_prep Interactive Data Viewer IDV, Panolopy, IDL, MATLAB, iPython (matplotlib), NCL, web browser (metadata) Interactive Analysis MATLAB, IDL, iPython, NCL Custom Application: Inudation Modeller Web Application Live Access Server IMOS Data Portal (WMS) Custom Java Servlet Programming DAP2 Legacy Code existing tools DAP2 New Code New tools **DAP4 programming legacy code support **DAP4 programming new data model and protocols streaming support **DAP4 programming Asynchronous access modes, server-side processing Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation, virtual data sets **DAP4 server-side operations, async access mode, new data model, posting Syntax Return data set info file.nc.dds - readable file.nc.ddx - XML file.nc.asc - ASCII data return Select variables file.nc.dods?var1,var2,var3 subset arrays file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Async access mode ?? Clients Programmatic Access Tsunami inudation modeller, NetCDF, NCO, PyDAP, PyNetCDF, MATLAB, IDL, … Interactive Access Web browser - Catalog MATLAB, IDL, Python, Panolopy,… Data Library & Catalog Service metadata harvesting directory listings remote THREDDS services Web Service Java servlet, Java applet Geospatial Information Service OPeNDAP data service Analysis Service Live Access Server Service Capabilities DAP2 response metadata, dods, ASCII / Binary **DAP4 Response async access mode, server- side, streaming, NcML Aggregation service Virtual Data Set Service Remote Data Access Metadata Conversion and RDF metadata definitions, translations (-> ISO) sematics, ontalogy CF->ISO, CF->WMS, CF->WCS Layered Services Catalogue service WMS, WCS services Authentication Conformance checks CF metadata check ISO metadata check **DAP4 features listed is my estimation and not the official specification
  • 12. Use Case limitations • Time to access data is dependent on the following factors: • Hardware and network performance • Selection of variables and dimensions • Number of data requests to be issued − Latency inherent in the data request • Number of concurrent accesses to the server
  • 13. DAP-enabled client tools/applications OPeNDAP Clients (partial list) http://opendap.org/whatClients 1. Web browser returning ASCII data 2. Pydap - is a pure Python library implementation of the DAP2 3. NetCDF - is a set of software libraries and self-describing, machine- independent data formats with interfaces to Python, FORTRAN, C/C++, and Java languages 4. NCO – comprises a dozen standalone, command-line programs that take netCDF files as input 5. MATLAB – a high-level technical computing language and interactive environment for algorithm development, data visualization, data analysis, and numerical computation 6. Panoply – Panoply is a cross-platform application which plots geo-gridded arrays from netCDF, HDF and GRIB datasets.
  • 14. Developments by Bureau and CSIRO • Development of web portals for data access services and information systems in climate and environment – Seasonal Climate Outlook Rebuild (Roald de Wit) – Natural Resource Management (NRM) Climate Change Portal (Tim Erwin) – eReef‟s Marine Quality Dashboard and data services (Jonathon Hodge) – National Environmental Information Infrastructure (NEII) (Andrew Woolf) – CAWCR research data services (Duan Beckett) • Establish Climate Data Publishing services at NCI – NCI, CSIRO, Bureau of Meteorology, CoE CSS – Earth System Grid (ESG) – Climate and Weather Science Laboratory (CWSLab)
  • 16. Project overview • More interactivity and functionality needed • Demand for POAMA multi-week forecast products • Long term view of seamless transition between forecasts • Building upon experiences / technologies from other BoM projects (e.g.MetEye and PASAP/PACCSAP)
  • 17. SCO-R architecture MapCache BOM.Map / BOM.App Custom WMS Service (Python)
  • 18. Climate Futures Climate Futures approach to the provision of regional climate projection information CMAR/CLIMATE ADAPTATION FLAGSHIP Tim Erwin Acknowledgements: Penny Whetton, Kevin Hennessy, John Clarke, David Kent 28 October 2013
  • 19. Climate Data • Processed from climate model data (CMIP3 and CMIP5) • NetCDF file format • 10 variables (temperature, rainfall, humidity...) • 20 year seasonal averages (2030, 2035, ..., 2090) • Base period (1950 – 2005) stored as monthly time span • Catalogued in THREDDS server – Allows DAP access • Django • THREDDS catalogues are parsed and stored – model, variable, dap url, layer name, time span
  • 22. ZOO-Project (WPS Server) • Consists of: Kernel, API, Service • Works with Apache through a cgi file and a conf file • Support several common programming languages C/C++, Fortran, Python, PHP, Perl, Java, JavaScript • Used to create area average of gridded data using non- rectangular mask • Predefined mask • Polygon (GML,KML,GEOM) • Not limited to geographic operations
  • 23. OPeNDAP Technology Developments • DAP4 protocol and data model implementation (OPULS) – OPULS (an OPeNDAP-Unidata collaboration) – DAP4 (to supersede DAP2) – Experimental extensions (Async access, UGRID subsets) • DAP2 & DAP4 JSON response type – Improve javascript client utilisation of DAP services • ncWMS integration and WMS extensions – contour map types – THREDDS and Hyrax integration of ncWMS • Programmatic Data Access for secure services – RDSI DaSh project to support programmatic data access – Integration within reX Identity and Authorisation Management
  • 24. DAP4 Experiments • DAP4 provides more complete support for functions including metadata responses (DAP2 does not provide this; a gap in the DAP2 specification) – Experiments with Unstructured Grid (irregular mesh) subsetting – Binning: returns a distribution (as a raster of boolean values on a user-specified grid) of data values satisfying some criteria – Masking: accepts a raster of zero/nonzero values as a query argument, perhaps as a geospatial selection criterion • OPeNDAP are running several experimental mini- projects within its context: – Asynchronous access, data streaming, cloud computing and an expanded, function-based, server-side processing system
  • 25. thank you – have a great experience Tim F. Pugh HPC and CWSLab Project Lead Melbourne, Victoria, Australia Email: t.pugh@bom.gov.au Office: +61 3 9669 4345
  • 26. Workshop Use-Cases Application Data Representation DAP2 data model domain neutral n-D, time series Application Types Programmatic / Langauge API FORTRAN, C/C++, JAVA, Python, NetCDF, Java Netcdf, PyDAP Programmatic / Tools NetCDF, NCO, PyDAP Custom Tools: OPeNDAP crawler Interactive Data Viewer Panolopy, MATLAB, NCL, web browser Programming DAP2 Legacy Code existing tools: DAP2 New Code New tools Data Access Protocol Metadata Request das, dds, ddx ASCII/Binary Data Request Simple data representation DAP Binary Object Request NcML Data Request aggregation Syntax Return metadata info file.nc.das - readable file.nc.dds - readable file.nc.ddx - XML metadata file.nc.help - help info Select vars and return data file.nc.asc?var1,var2,var3 file.nc.dods?var1,var2,var3 subset arrays, return data file.asc?var1(0:1:10) file.dods?var1(0:1:10) Return file translations file.nc.netcdf - NetCDF file Server-side operations file.nc?GEOLOC() Clients Programmatic Access NetCDF, NCO, PyDAP, PyNetCDF Interactive Access Web browser - Catalog Python, MATLAB, Panolopy Service Capabilities DAP2 response THREDDS data service Hyrax data service NcML Aggregation service Layered Services Catalog service WMS
  • 27. Pydap client • >>> from pydap.client import open_url • >>> dataset = open_url('http://test.opendap.org/dap/data/nc/coads_climatology.nc') • >>> var = dataset['SST'] • >>> var.shape • (12, 90, 180) • >>> var.type • <class 'pydap.model.Float32'> • >>> print var[0,10:14,10:14] # this will download data from the server • <class 'pydap.model.GridType'> • with data • [[ -1.26285708e+00 -9.99999979e+33 -9.99999979e+33 -9.99999979e+33] • [ -7.69166648e-01 -7.79999971e-01 -6.75454497e-01 -5.95714271e-01] • [ 1.28333330e-01 -5.00000156e-02 -6.36363626e-02 -1.41666666e-01] • [ 6.38000011e-01 8.95384610e-01 7.21666634e-01 8.10000002e-01]] • and axes • 366.0 • [-69. -67. -65. -63.] • [ 41. 43. 45. 47.]
  • 28. NetCDF client • >>> import netCDF4 • >>> url = 'http://test.opendap.org/dap/data/nc/coads_climatology.nc‟ • >>> dataset = netCDF4.Dataset(url) • >>> var = dataset.variables['SST'] • >>> var.shape • (12, 90, 180) • >>> print var[0,10:14,10:14] # this will download data from the server • <class 'pydap.model.GridType'> • with data • [[-1.26285707951 -- -- --] • [-0.769166648388 -0.77999997139 -0.675454497337 -0.595714271069] • [0.128333330154 -0.0500000156462 -0.0636363625526 -0.141666665673] • [0.638000011444 0.895384609699 0.721666634083 0.810000002384]] • >>> print var • <type 'netCDF4.Variable'> • float32 SST('TIME', 'COADSY', 'COADSX') • …
  • 29. MATLAB and SNCtools • % ex_snctools_opendap.m • % Read from a remote OPeNDAP server with the same file • % • ncRef = 'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc' • nc_dump( ncRef ); • pause • temp = nc_varget( ncRef, 'analysed_sst'); • lon = nc_varget( ncRef, 'lon'); • lat = nc_varget( ncRef, 'lat'); • imagesc(lat, lon, temp); axis xy
  • 30. MATLAB and NJTbx demo • % ex_njtbx.m • % Read from a remote OPeNDAP server with the same file • % • ncRef = 'http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc' • nj_info( ncRef ) • pause • [temp, grid] = nj_grid_varget(ncRef,'analysed_sst'); • imagesc(grid.lon, grid.lat, temp); axis xy; colorbar
  • 31. Tomcat/[Apache] dodsC fileServer wms ncss THREDDS services syntax {contextPath} = “thredds” (servlet default name) {service} = “fileServer” | “dodsC” | “wms” | “wcs” • Bulk File Transfer fileServer = HTTP Server (any file) • Remote access, subsetting CDM files dodsC = OPeNDAP (any CDM file) wms = Web Map Server (grids) wcs = Web Coverage Server (grids) ncss = NetCDF Subset Service (grids) admin = Administration/debug interface Note, each server can change the service name in the xml catalogue. http://{server:port}/{contextPath}/{service}/... wcs Catalogs thredds
  • 32. Hyrax service syntax Tomcat/[Apache] opendap hyrax docs {contextPath} = “opendap” (servlet default name) {service} = “hyrax” | “admin” | “docs” hyrax = catalog interface admin = administration interface (v1.8+) docs = documentation (v1.8+) Note, each server can change the service name within the server configuration file. http://{server:port}/{contextPath}/{service}/… http://test.opendap.org/opendap/hyrax/...e.g. admin
  • 33. Hyrax Data Service • DAP2 and DAP3.x as the protocol develops • Other dataset responses* • ASCII & NetCDF renderings of data (not limited to data natively stored in netCDF) • RDF • ISO 19115 and the conformance rubric (Hyrax 1.8) • Other server responses** • THREDDS catalogs Tomcat/[Apache] Hyrax DAP2 RDF* Catalogs** DAP3.x Note: Hyrax and TDS are not mutually excusive; Sites can install both with little extra effort.
  • 34. Data Discovery and Access • Data discovery services • NASA‟s Global Change Master Directory − http://gcmd.nasa.gov • IMOS eMII portal − http://imosmest.aodn.org.au/geonetwork/srv/en/main.home − Help --> http://emii1.its.utas.edu.au/drupal/?q=node/25 • TERN AusCover portal − http://data.auscover.org.au/ • My Ocean portal − http://www.myocean.eu/web/24-catalogue.php • TPAC Digital Library − http://dl.tpac.org.au • Data access services • Unidata‟s THREDDS Data Service − http://www.unidata.ucar.edu/projects/THREDDS/ • OPeNDAP‟s Hyrax Data Service − http://opendap.org/download/hyrax.html • NOAA‟s ERDDAP Data Service − http://coastwatch.pfeg.noaa.gov/erddap
  • 35. Some of the Technology in Hyrax 1. THREDDS Dataset Inventory Catalogs provide virtual directories of available data and associated metadata. 2. Supports many formats and data stores: netCDF3, netCDF4, HDF4, HDF5, FreeForm, SQL data bases 3. Uses a plug-in based architecture and includes tools to write custom handlers 4. NetCDF Markup Language (NcML) to modify and create virtual aggregations of datasets. 5. OPeNDAP access with subsetting data access method. 6. bulk file access through the HTTP protocol. 7. ncISO server provides automated metadata analysis and ISO metadata generation. 8. RDF output - Metadata as triples; used with web-based reasoning systems 9. Code that has passed a formal security audit 10. A true multi-system architecture that can fit in a variety of enterprise settings 11. An administrator‟s interface
  • 36. DAP Responses • DAP2 defines three response types: • DAS: A text document that contains data set attributes • DDS: A text document that contains data set variable types and names • DODS: A quasi-multipart MIME document that contains the DDS and associated binary values for a data request • DAP3.x defines two additional response types: • DDX: An XML document that combines both variable type and name information along with attributes • DataDDX: A multipart MIME document that combines a DDX with the associated binary values for a data request TDS and Hyrax both support DAP2; Hyrax includes support for DAP3, TDS has support for the DDX
  • 37. Some Definitions DAP = Data Access Protocol  Model used to describe the data;  Request syntax and semantics; and  Response syntax and semantics.  The data structure returned to the user OPeNDAP  The software that forms the service;  Numerous implementations (Hyrax (reference), THREDDS,…);  Core/libraries for client applications and services. THREDDS / Hyrax  A service framework (portal) that contains the OPeNDAP service;
  • 38. Decipher the URL • http://opendap.bom.gov.au:8080/thredds/dodsC/gamssa_4deg/2011/201111 06-ABOM-L4LRfnd-GLOB-v01-fv01.nc.ascii?lon[0:1:1439] • Given the OPeNDAP data request above, decipher the URL. − Request Protocol? http − Host name:port? //opendap.bom.gov.au:8080/ − ContextPath? thredds/ − Service? dodsC/ − Unique path to data set? gamssa_4deg/2009/ − Data reference? 20111106-ABOM-L4LRfnd-GLOB-v01-fv01.nc − Return type? ascii − Return variables? ?lon − Return variable indice range? [0:1:1439] --> [start:skip:end]
  • 39. NcML NetCDF Meta Language NcML can provide two basic features: • Augmenting/Modifying data sets with new • Attributes • Values • Combining two or more data sets (i.e., files) in an aggregation Three kinds of aggregation are supported: • Tile files • Join files along an existing axis • Join files along a new axis While very powerful, these aggregations are not applicable to every data set made up of multiple files
  • 40. DAP4 Summary • DAP (DAP2 and DAP4) is based on datasets built of variables that share the characteristics of programming languages • Constraints are used to subset data on the server • DAP4 is a REST API • DAP4 specifies „modern‟ web services – While DAP2 was a data model only, DAP4 includes specification of the web services • DAP4 provides more complete support for functions