SlideShare a Scribd company logo
EXPLORING THE
HUBNESS-RELATED   Nenad
                  Tomašev
  PROPERTIES OF   Dunja
 OCEANOGRAPHIC    Mladenić


    SENSOR DATA
PRESENTATION OUTLINE


Hubness and why it matters

 Oceanographic data: overview

 Bad hubs in the measurements

Visualizing the problematic sensors
WHY IT MATTERS

 Hubness is the skewness (asymmetry) in the distribution of k-
  occurrences: some points ( Hubs) become neighbors very VERY
  often

 This often happens in high dimensional data

 It is, however, a phenomenon only of importance for nearest-
  neighbor methods

 So, why should we care, in general?
WHY IT MATTERS

 Sensor data = streams, time series

 The state of the art for time series data: 1 -NN classifier
  coupled with an appropriate metric for comparing the time
  series

 In other words: nearest neighbor methods are not only
  occasionally used for time series classification, they are
  considered the state of the art!

 So, hubness matters.
RELATED WORK

 Radovanovic, Nanopulous, Ivanovic: Time series classification
  in many intrinsic dimensions, SDM 2010

 Due to the correlation between subsequent values, not all
  time series are inherently very high dimensional

 Some, however – are. These time series have been shown to
  exhibit hubness. Also – bad hubness.

 It was shown that in such cases, bad -hubness-based weighting
  is helpful (the hw -kNN algorithm)
ANALYSIS GOALS



 Explore the k-nearest neighbor structure of the oceanographic
  sensor data

 Explore the bad hubness in the data

 Visualize the results
TEST CASE: OCEANOGRAPHIC DATA

 Integrated Ocean Observing System data
  (http://www.ioos.gov/)

 Nodes spread across the Pacific, Atlantic and Great lakes…

 Several sensors at each node, measuring various quantities

 air temperature, barometric pressure, wind, water level
  observation, water level prediction, salinity, water
  temperature and conductivity
TEST CASE: OCEANOGRAPHIC DATA

 20 days worth of measurements

 10.11 .-30.11.2010.

 Sampled every 6 minutes (10 measurements an hour)

 4801 measurements total for each sensor

 Missing values: replaced by the average of the closest known
  values
THE EXPERIMENTAL SETUP



 Tested under two dif ferent metrics
   Manhattan, Variance of between-series differences
   Future work: perform the experiments with DTW (Dynamic Time
    Warping)


 Defined “Pacific”, “Atlantic” and “Lakes” as location-based
  labels = 3 categories
SKEWNESS, BAD HUBNESS
CLASS TO CLASS HUBNESS MATRIX, K=3,
        WIND MEASUREMENTS

   0.772          0.186            0.042

   0.013          0.987             0.0

   0.027          0.014            0.959


    Atlantic = 1. Pacific = 2. Lakes = 3
WOULD THE HUBNESS-AWARE METHODS
             HELP?
WIND MEASUREMENTS: SENSOR
       HUBNESS MAP
WIND MEASUREMENTS: SENSOR
       HUBNESS MAP
WATER TEMPERATURE: SENSOR
       HUBNESS MAP
WATER TEMPERATURE: SENSOR
       HUBNESS MAP
BAROMETRIC PRESSURE: SENSOR
       HUBNESS MAP
AIR TEMPERATURE: THE BERMUDA
         TRIANGLE 
CONCLUSIONS:

 Bad hubness may be useful to detect potentially erroneous
  measurement devices

 Some measurement type stream apparently do exhibit
  hubness, so hubness is a phenomenon of interest for dealing
  with sensor data

 Hubness-aware methods could be potentially helpful when
  working with sensor data
AKNOWLEDGEMENTS

This work was supported by the ICT
 Programme of the EC PlanetData (ICTNoE-
 257641).
THANK YOU FOR YOUR ATTENTION

More Related Content

PDF
Anita khadka
PDF
DSD-INT 2016 Integrating information sources for inland waters modelling - Ba...
PPTX
Texas Water Science Center (USGS) Activities in Response to Hurricane Harvey:...
PDF
DSD-INT 2016 The eWaterCyle global Hydrological forecasting system - Drost
PPTX
Calibration of Flow Sensors by Nabeel Ehmed
PDF
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PPT
DoD Poster Howell 2010
PPTX
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
Anita khadka
DSD-INT 2016 Integrating information sources for inland waters modelling - Ba...
Texas Water Science Center (USGS) Activities in Response to Hurricane Harvey:...
DSD-INT 2016 The eWaterCyle global Hydrological forecasting system - Drost
Calibration of Flow Sensors by Nabeel Ehmed
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
DoD Poster Howell 2010
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...

What's hot (20)

PDF
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PDF
Impact of time displaced precipitation estimates for online updated models
PPTX
Hydrology measuring rain
PDF
T7: Flood Risk Assessment Using GIS Tools
 
PPT
Geostatistical Space Time Modeling
PPT
MONITORING LONG TERM VARIABILITY IN THE ATMOSPHERIC WATER VAPOUR CONTENT USIN...
PDF
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PPTX
Mid term presentation_Sunil Basnet
PDF
Dragana densitometry 2nd-behydroday_v2
PDF
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PPTX
Development of a Flood Warning Tool Set for Bandera, Texas - Doug Schnoebelen
PDF
Effects of Climate Change on Hydrology and Hydropower Systems in the Italian ...
PPT
NineYearsofAtmosphericRemoteSensingwithSCIAMACHY-InstrumentPerformance.ppt
PPTX
Metrological instuments.
PDF
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PPT
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
PPTX
igarss2011_lion.pptx
PPTX
DRI and UAS Applications Research
PDF
ESA_smpehle_16October2015
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
Impact of time displaced precipitation estimates for online updated models
Hydrology measuring rain
T7: Flood Risk Assessment Using GIS Tools
 
Geostatistical Space Time Modeling
MONITORING LONG TERM VARIABILITY IN THE ATMOSPHERIC WATER VAPOUR CONTENT USIN...
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
Mid term presentation_Sunil Basnet
Dragana densitometry 2nd-behydroday_v2
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
Development of a Flood Warning Tool Set for Bandera, Texas - Doug Schnoebelen
Effects of Climate Change on Hydrology and Hydropower Systems in the Italian ...
NineYearsofAtmosphericRemoteSensingwithSCIAMACHY-InstrumentPerformance.ppt
Metrological instuments.
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
IUKWC Workshop Nov16: Developing Hydro-climatic Services for Water Security –...
igarss2011_lion.pptx
DRI and UAS Applications Research
ESA_smpehle_16October2015
Ad

Viewers also liked (8)

ODP
Bases I Concurso Fotografias Guadalmolares
PDF
SWINBURNE CERT_Degree
DOCX
Listening and speaking 1
PDF
11 angi soril 22.blogt
PPTX
Cómo sacarle mejor provecho a Linkedin
ODP
Actividades abril 2013
PDF
13.503.31.2170_Worksheet_2014-07-09_09_43_05
PDF
Growth for SaaS using conversion optimization
Bases I Concurso Fotografias Guadalmolares
SWINBURNE CERT_Degree
Listening and speaking 1
11 angi soril 22.blogt
Cómo sacarle mejor provecho a Linkedin
Actividades abril 2013
13.503.31.2170_Worksheet_2014-07-09_09_43_05
Growth for SaaS using conversion optimization
Ad

Similar to Exploring The Hubness-Related Properties of Oceanographic Sensor Data (20)

PPT
2003-12-04 Evaluation of the ASOS Light Scattering Network
PDF
An Introduction to the Environment Agency extreme offshore wave, water level ...
PDF
PPTX
Sat fc j-intro_mw_remotesensing
PDF
Measuring electronic latencies in MINOS with Auxiliary Detector
PPT
YellowIGARSS.ppt
PPT
Pierdicca-Igarss2011_july2011.ppt
PPT
Pierdicca-Igarss2011_july2011.ppt
DOCX
Technology~linkage data .
PPTX
Nick - Benefits of Using Combined Bathymetry and Side Scan Sonar in Shallow W...
PPT
Remote sensing by jitendra thakor
PPT
3919841 (1).ppt
PDF
Hydraulic Fracturing Stimulation Monitoring with Distributed Fiber Optic Sens...
PPT
Multi-sensor Improved Sea Surface Temperatures Project
PPTX
Methodology
PPTX
Gis120 lec1 slide_share_practice
DOCX
Discharge measurement using a current meter.docx
PDF
DSD-INT 2019 Modelling of the Danube Delta and of the Razelm-Sinoe lagoon-Bajo
PPTX
Underwater wireless communication
PPTX
Seismic interpretation work flow final ppt
2003-12-04 Evaluation of the ASOS Light Scattering Network
An Introduction to the Environment Agency extreme offshore wave, water level ...
Sat fc j-intro_mw_remotesensing
Measuring electronic latencies in MINOS with Auxiliary Detector
YellowIGARSS.ppt
Pierdicca-Igarss2011_july2011.ppt
Pierdicca-Igarss2011_july2011.ppt
Technology~linkage data .
Nick - Benefits of Using Combined Bathymetry and Side Scan Sonar in Shallow W...
Remote sensing by jitendra thakor
3919841 (1).ppt
Hydraulic Fracturing Stimulation Monitoring with Distributed Fiber Optic Sens...
Multi-sensor Improved Sea Surface Temperatures Project
Methodology
Gis120 lec1 slide_share_practice
Discharge measurement using a current meter.docx
DSD-INT 2019 Modelling of the Danube Delta and of the Razelm-Sinoe lagoon-Bajo
Underwater wireless communication
Seismic interpretation work flow final ppt

More from PlanetData Network of Excellence (20)

PDF
A Contextualized Knowledge Repository for Open Data about Trentino
PDF
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
PDF
Towards Enabling Probabilistic Databases for Participatory Sensing
PDF
Privacy-Preserving Schema Reuse
PDF
Pay-as-you-go Reconciliation in Schema Matching Networks
PPTX
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
PPT
On the need for a W3C community group on RDF Stream Processing
PDF
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
PDF
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
PDF
SciQL, Bridging the Gap between Science and Relational DBMS
PPT
CLODA: A Crowdsourced Linked Open Data Architecture
PDF
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
PPT
Data and Knowledge Evolution
PPS
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
PPS
Access Control for RDF graphs using Abstract Models
PDF
Arrays in Databases, the next frontier?
PPS
Abstract Access Control Model for Dynamic RDF Datasets
PPTX
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
PDF
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
A Contextualized Knowledge Repository for Open Data about Trentino
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
Towards Enabling Probabilistic Databases for Participatory Sensing
Privacy-Preserving Schema Reuse
Pay-as-you-go Reconciliation in Schema Matching Networks
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
On the need for a W3C community group on RDF Stream Processing
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
SciQL, Bridging the Gap between Science and Relational DBMS
CLODA: A Crowdsourced Linked Open Data Architecture
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Data and Knowledge Evolution
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Access Control for RDF graphs using Abstract Models
Arrays in Databases, the next frontier?
Abstract Access Control Model for Dynamic RDF Datasets
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Machine learning based COVID-19 study performance prediction
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
1. Introduction to Computer Programming.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Machine Learning_overview_presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Tartificialntelligence_presentation.pptx
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Machine learning based COVID-19 study performance prediction
SOPHOS-XG Firewall Administrator PPT.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
20250228 LYD VKU AI Blended-Learning.pptx
1. Introduction to Computer Programming.pptx
A comparative analysis of optical character recognition models for extracting...
Spectroscopy.pptx food analysis technology
Machine Learning_overview_presentation.pptx
Approach and Philosophy of On baking technology
Assigned Numbers - 2025 - Bluetooth® Document
Empathic Computing: Creating Shared Understanding
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Tartificialntelligence_presentation.pptx
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...

Exploring The Hubness-Related Properties of Oceanographic Sensor Data

  • 1. EXPLORING THE HUBNESS-RELATED Nenad Tomašev PROPERTIES OF Dunja OCEANOGRAPHIC Mladenić SENSOR DATA
  • 2. PRESENTATION OUTLINE Hubness and why it matters Oceanographic data: overview Bad hubs in the measurements Visualizing the problematic sensors
  • 3. WHY IT MATTERS  Hubness is the skewness (asymmetry) in the distribution of k- occurrences: some points ( Hubs) become neighbors very VERY often  This often happens in high dimensional data  It is, however, a phenomenon only of importance for nearest- neighbor methods  So, why should we care, in general?
  • 4. WHY IT MATTERS  Sensor data = streams, time series  The state of the art for time series data: 1 -NN classifier coupled with an appropriate metric for comparing the time series  In other words: nearest neighbor methods are not only occasionally used for time series classification, they are considered the state of the art!  So, hubness matters.
  • 5. RELATED WORK  Radovanovic, Nanopulous, Ivanovic: Time series classification in many intrinsic dimensions, SDM 2010  Due to the correlation between subsequent values, not all time series are inherently very high dimensional  Some, however – are. These time series have been shown to exhibit hubness. Also – bad hubness.  It was shown that in such cases, bad -hubness-based weighting is helpful (the hw -kNN algorithm)
  • 6. ANALYSIS GOALS  Explore the k-nearest neighbor structure of the oceanographic sensor data  Explore the bad hubness in the data  Visualize the results
  • 7. TEST CASE: OCEANOGRAPHIC DATA  Integrated Ocean Observing System data (http://www.ioos.gov/)  Nodes spread across the Pacific, Atlantic and Great lakes…  Several sensors at each node, measuring various quantities  air temperature, barometric pressure, wind, water level observation, water level prediction, salinity, water temperature and conductivity
  • 8. TEST CASE: OCEANOGRAPHIC DATA  20 days worth of measurements  10.11 .-30.11.2010.  Sampled every 6 minutes (10 measurements an hour)  4801 measurements total for each sensor  Missing values: replaced by the average of the closest known values
  • 9. THE EXPERIMENTAL SETUP  Tested under two dif ferent metrics  Manhattan, Variance of between-series differences  Future work: perform the experiments with DTW (Dynamic Time Warping)  Defined “Pacific”, “Atlantic” and “Lakes” as location-based labels = 3 categories
  • 11. CLASS TO CLASS HUBNESS MATRIX, K=3, WIND MEASUREMENTS 0.772 0.186 0.042 0.013 0.987 0.0 0.027 0.014 0.959 Atlantic = 1. Pacific = 2. Lakes = 3
  • 12. WOULD THE HUBNESS-AWARE METHODS HELP?
  • 18. AIR TEMPERATURE: THE BERMUDA TRIANGLE 
  • 19. CONCLUSIONS:  Bad hubness may be useful to detect potentially erroneous measurement devices  Some measurement type stream apparently do exhibit hubness, so hubness is a phenomenon of interest for dealing with sensor data  Hubness-aware methods could be potentially helpful when working with sensor data
  • 20. AKNOWLEDGEMENTS This work was supported by the ICT Programme of the EC PlanetData (ICTNoE- 257641).
  • 21. THANK YOU FOR YOUR ATTENTION