SlideShare a Scribd company logo
Generalized Notions of Data Depth
Spring 2015 Data Reading Seminar
Mukund Raj
12th Mar, 2015
1 / 25
Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
2 / 25
What is Data Depth?
A means of measuring how deep a data point p is within a
cloud of points {p1, . . . , pn}.
Multivariate data analysis approach to generate order statistics
which capture high-dimensional features and relationships.
Descriptive nonparametric method of statistical analysis.
3 / 25
Why is Data Depth Interesting?
Estimate the location from center outward ( with respect to
parent distribution ).
Identify outliers.
Formulate quantitative and graphical methods for analyzing
distributional characteristics such as location, scale, e.t.c as
well as hypothesis testing.
Robustness.
4 / 25
Various Formulations of Data Depth
Geometrical (for Data in
Euclidean Space)
L2 depth
Mahalanobis depth
Oja depth
Expected convex hull depth
Zonoid depth
Simplex depth
Half Space depth or Tukey
depth or Location depth
Generalized (for Complex Data)
Functional Band Depth
Depth for Multivariate
Curves
Sets
Paths on a Graph
5 / 25
Geometrical data depth
Depth based on distances / volumes
L2 depth
Mahalanobis depth
Oja depth
Depth based on weighted means
Zonoid depth
Expected Convex Hull depth
Depth based on half spaces and simplices
Tukey depth
Simplicial depth
[Mosler 2012]
6 / 25
General Properties of Data Depth
1 Zero at infinity
2 Maximality at Center
3 Monotonicity
4 Affine Invariance
[Zuo and Serfling, 2000]
7 / 25
Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
8 / 25
Function Ensembles
A function ensemble can be defined as:
{xi (t), i = 1, . . . , n, t ∈ I} where I is an interval in and
xi : →
Time series observations annual trend of temperature or
precipitation, prices of commodities, heights of children versus
age e.t.c.
9 / 25
Motivation for Functional Band Depth
Challenge with regular multivariate analysis of functions
Curve ensembles that are sampled at different points.
Curse of dimensionality in case of current methods (e.g.
PCA).
Contribution by [L´opez-Pintado et. al. 2009]
Given an ensemble of functions (sampled from a distribution),
a formulation of data depth associated with the function.
10 / 25
Functional Band Depth Formulation
Figure: A functional band [Lopez-Pintado et. al. 2009].
Functional band formulation:
g ⊂ B(f1, · · · , fj ) iff ∀x min
i∈{1...j}
{fi(x)} ≤ g(x) ≤ max
i∈{1...j}
{fi(x)}
(1)
Functional band depth formulation:
BDj (g) = P (g ⊂ B(f1, · · · , fj)) (2)
11 / 25
Visualization of Data Depth for Functions
Figure: Visualization of function
ensemble [Lopez-Pintados et. al.
2009].
Figure: Boxplot visualization of
function ensemble [Sun et. al. 2011,
Whitaker et. al. 2013].
12 / 25
Multivariate Curve Ensembles
A parameterized curve can be defined in terms
of an independent parameter s as:
c(s) = ˜x(s) c : D → R D ⊂ R, R ⊂ Rd
Hurricane paths.
Brain tractography data.
Pathline ensemble in fluid simulation. Figure: A synthetic
ensemble of
multivariate curves in
[Mirzargar et. al.
2014]
13 / 25
Data Depth Formulation for Multivariate Curves
(a) (b)
Figure: Band formed by 3 multivariate curves [Lopez-Pintado et. al.
2014, Mirzargar et. al. 2014]
Curve band formulation:
g ⊂ B(ci1 , · · · , cij
) iff ∀x g(x) ∈ simplex ci1 (x), · · · , cij (x)
(3)
Curve band depth formulation:
SBDj (g) = P g ⊂ B(fc1 , · · · , cij ) (4)
14 / 25
Visualization of Data Depth for Curves
Figure: Chinese Script replicated
100 times [Lopez-Pintado 2014].
Figure: Curve boxplot for hurricane
path ensemble [Mirzargar et. al.
2014]
15 / 25
Set / Isocontour Ensembles
Given an ensemble of real valued functions
f (x, y), the sublevel and superlevel sets for any
particular isovalue.
Isocontours of temperature field.
Isocontours of pressure field in fluid
dynamics simulations.
Figure: A synthetic
ensemble of contours
in [Whitaker et. al.
2013]
16 / 25
Data Depth Formulation for Sets
Figure: Examples of set band [Whitaker et. al. 2013]
Set band formulation:
S ∈ sB(S1, . . . , Dj ) ↔
j
k=1
Sk ⊂ S ⊂
j
k=1
Sk (5)
Set band depth formulation:
sBDj (S) = P (S ⊂ sB(S1, . . . , Sj ) (6)
17 / 25
Visualization of Data Depth for Sets
(a)
(b)
Figure: Contour boxplot for an ensemble of isocontours of pressure field
[Whitaker et. al. 2013]
18 / 25
Paths (on a graph)
Let G = {V , E, W }. A path p can be denoted
as p : I → V where index set I = (1, . . . , m)
Paths of packets in computer networks.
Paths on transportation networks
modelled as graphs.
Figure: A synthetic
ensemble of paths on
a graph.
19 / 25
Data Depth Formulation for Paths
Figure: Illustration of band formed by 3 paths.
Path band formulation:
p ∈ B[Pj ] iff p(l) ∈ H[p1(l), . . . , pj (l)] ∀l ∈ I (7)
Path band depth formulation:
pBDj (p) = E [χ(p ∈ B(pj ))] (8)
20 / 25
Visualization of Data Depth for Paths
(a) (b)
Figure: Path boxplots for paths on AS and road graphs.
21 / 25
Outline
1 Data Depth Background
What is Data Depth?
Geometrical Data Depth
General Properties of Data Depth
2 Generalized Notions of Data Depth
Functions
Multivariate Curves
Sets
Paths (on a graph)
3 Discussion
Relaxed Formulations
Advantages and Limitations of Data Depth
22 / 25
Relaxed formulations
1 Modified Band Depth - Instead of an indicator function,
measure object inside the band.
2 Subsets - Indicator function with a relaxed threshold.
23 / 25
Advantages and Limitations
For Combinatorial Data Depth Formulations for Complex Data
Advantages
No assumption required for the underlying distribution.
Captures nonlocal relationships
Robust.
Limitations
Computationally expensive for large ensembles.
24 / 25
Thank You
Questions?
25 / 25

More Related Content

PDF
Detection & Estimation Theory
PPT
Chapter-05c-Image-Restoration-(Reconstruction-from-Projections).ppt
PPTX
Color models
PDF
01 introduction halsall-ch1
PPTX
Color image processing
PDF
Chapter 8 - Multimedia Storage and Retrieval
PPT
Narrativa cinematografica
PDF
DSP_2018_FOEHU - Lec 0 - Course Outlines
Detection & Estimation Theory
Chapter-05c-Image-Restoration-(Reconstruction-from-Projections).ppt
Color models
01 introduction halsall-ch1
Color image processing
Chapter 8 - Multimedia Storage and Retrieval
Narrativa cinematografica
DSP_2018_FOEHU - Lec 0 - Course Outlines

What's hot (20)

PDF
MPEG Compression Standards
PPTX
Lect 06
PPT
10 color image processing
PPTX
El tiempo cinematografico
PDF
color image processing
PPTX
Digital image processing Tool presentation
PPTX
Color-in-Digital-Image-Processing.pptx
PDF
Applications of Lattice Boltzmann Method in Dynamic Modelling of Fluid Flows
PPT
Shape Features
PPTX
Analisi inquadrature
PDF
Cine Cultura Audiovisual
PPTX
Region based segmentation
PPTX
Clipping computer graphics
PPT
Antropologia Visual
PPTX
Image Smoothing using Frequency Domain Filters
PPTX
Perspective projection
PPT
Fotografía composición
PPTX
Tecnicas cinematograficas
PDF
Planos, angulos y movimientos de camara
PPTX
MPEG Compression Standards
Lect 06
10 color image processing
El tiempo cinematografico
color image processing
Digital image processing Tool presentation
Color-in-Digital-Image-Processing.pptx
Applications of Lattice Boltzmann Method in Dynamic Modelling of Fluid Flows
Shape Features
Analisi inquadrature
Cine Cultura Audiovisual
Region based segmentation
Clipping computer graphics
Antropologia Visual
Image Smoothing using Frequency Domain Filters
Perspective projection
Fotografía composición
Tecnicas cinematograficas
Planos, angulos y movimientos de camara
Ad

Viewers also liked (11)

PPT
Between the two ages
RTF
Man eating bus
RTF
The barbecue at pidgeon court
PPTX
Yeni microsoft office power point sunusu
RTF
The worktatorship
PDF
Img 2637.jpg
PDF
Satellite Madrid Informe
PPT
Del modelo manicomial al modelo comunitario
PDF
Presentation on Nutritional Supplements
PPTX
Social Media: Legal Pitfalls and Best Practices - SXSWedu 2016
PPTX
Cultura fenicia diapositivas info
Between the two ages
Man eating bus
The barbecue at pidgeon court
Yeni microsoft office power point sunusu
The worktatorship
Img 2637.jpg
Satellite Madrid Informe
Del modelo manicomial al modelo comunitario
Presentation on Nutritional Supplements
Social Media: Legal Pitfalls and Best Practices - SXSWedu 2016
Cultura fenicia diapositivas info
Ad

Similar to Generalized Notions of Data Depth (20)

PDF
Basic R Data Manipulation
PDF
Wavelet Signal Processing
PPTX
PPTX
Wavelet
PPTX
Wavelet Based Image Compression Using FPGA
PDF
Pattern learning and recognition on statistical manifolds: An information-geo...
PPTX
Matlab Distributions
PPTX
Matlab: Statistics and Distributions
PPTX
CH 4_TYBSC(CS)_Data Science_Visualisation
PDF
The application wavelet transform algorithm in testing adc effective number o...
PPTX
Data manipulation and visualization in r 20190711 myanmarucsy
PPTX
Ggplot2 v3
PDF
Wavelets for computer_graphics_stollnitz
PDF
Data science
PDF
3 module 2
PDF
Slides: A glance at information-geometric signal processing
PPTX
Topological Data Analysis.pptx
PPTX
Day-2.pptx
PDF
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
Basic R Data Manipulation
Wavelet Signal Processing
Wavelet
Wavelet Based Image Compression Using FPGA
Pattern learning and recognition on statistical manifolds: An information-geo...
Matlab Distributions
Matlab: Statistics and Distributions
CH 4_TYBSC(CS)_Data Science_Visualisation
The application wavelet transform algorithm in testing adc effective number o...
Data manipulation and visualization in r 20190711 myanmarucsy
Ggplot2 v3
Wavelets for computer_graphics_stollnitz
Data science
3 module 2
Slides: A glance at information-geometric signal processing
Topological Data Analysis.pptx
Day-2.pptx
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...

Recently uploaded (20)

PPTX
Database Infoormation System (DBIS).pptx
PDF
Introduction to the R Programming Language
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Leprosy and NLEP programme community medicine
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Transcultural that can help you someday.
PPTX
Managing Community Partner Relationships
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Business Analytics and business intelligence.pdf
PDF
[EN] Industrial Machine Downtime Prediction
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
climate analysis of Dhaka ,Banglades.pptx
Database Infoormation System (DBIS).pptx
Introduction to the R Programming Language
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Leprosy and NLEP programme community medicine
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Galatica Smart Energy Infrastructure Startup Pitch Deck
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Qualitative Qantitative and Mixed Methods.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
SAP 2 completion done . PRESENTATION.pptx
Transcultural that can help you someday.
Managing Community Partner Relationships
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Business Analytics and business intelligence.pdf
[EN] Industrial Machine Downtime Prediction
Optimise Shopper Experiences with a Strong Data Estate.pdf
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
climate analysis of Dhaka ,Banglades.pptx

Generalized Notions of Data Depth

  • 1. Generalized Notions of Data Depth Spring 2015 Data Reading Seminar Mukund Raj 12th Mar, 2015 1 / 25
  • 2. Outline 1 Data Depth Background What is Data Depth? Geometrical Data Depth General Properties of Data Depth 2 Generalized Notions of Data Depth Functions Multivariate Curves Sets Paths (on a graph) 3 Discussion Relaxed Formulations Advantages and Limitations of Data Depth 2 / 25
  • 3. What is Data Depth? A means of measuring how deep a data point p is within a cloud of points {p1, . . . , pn}. Multivariate data analysis approach to generate order statistics which capture high-dimensional features and relationships. Descriptive nonparametric method of statistical analysis. 3 / 25
  • 4. Why is Data Depth Interesting? Estimate the location from center outward ( with respect to parent distribution ). Identify outliers. Formulate quantitative and graphical methods for analyzing distributional characteristics such as location, scale, e.t.c as well as hypothesis testing. Robustness. 4 / 25
  • 5. Various Formulations of Data Depth Geometrical (for Data in Euclidean Space) L2 depth Mahalanobis depth Oja depth Expected convex hull depth Zonoid depth Simplex depth Half Space depth or Tukey depth or Location depth Generalized (for Complex Data) Functional Band Depth Depth for Multivariate Curves Sets Paths on a Graph 5 / 25
  • 6. Geometrical data depth Depth based on distances / volumes L2 depth Mahalanobis depth Oja depth Depth based on weighted means Zonoid depth Expected Convex Hull depth Depth based on half spaces and simplices Tukey depth Simplicial depth [Mosler 2012] 6 / 25
  • 7. General Properties of Data Depth 1 Zero at infinity 2 Maximality at Center 3 Monotonicity 4 Affine Invariance [Zuo and Serfling, 2000] 7 / 25
  • 8. Outline 1 Data Depth Background What is Data Depth? Geometrical Data Depth General Properties of Data Depth 2 Generalized Notions of Data Depth Functions Multivariate Curves Sets Paths (on a graph) 3 Discussion Relaxed Formulations Advantages and Limitations of Data Depth 8 / 25
  • 9. Function Ensembles A function ensemble can be defined as: {xi (t), i = 1, . . . , n, t ∈ I} where I is an interval in and xi : → Time series observations annual trend of temperature or precipitation, prices of commodities, heights of children versus age e.t.c. 9 / 25
  • 10. Motivation for Functional Band Depth Challenge with regular multivariate analysis of functions Curve ensembles that are sampled at different points. Curse of dimensionality in case of current methods (e.g. PCA). Contribution by [L´opez-Pintado et. al. 2009] Given an ensemble of functions (sampled from a distribution), a formulation of data depth associated with the function. 10 / 25
  • 11. Functional Band Depth Formulation Figure: A functional band [Lopez-Pintado et. al. 2009]. Functional band formulation: g ⊂ B(f1, · · · , fj ) iff ∀x min i∈{1...j} {fi(x)} ≤ g(x) ≤ max i∈{1...j} {fi(x)} (1) Functional band depth formulation: BDj (g) = P (g ⊂ B(f1, · · · , fj)) (2) 11 / 25
  • 12. Visualization of Data Depth for Functions Figure: Visualization of function ensemble [Lopez-Pintados et. al. 2009]. Figure: Boxplot visualization of function ensemble [Sun et. al. 2011, Whitaker et. al. 2013]. 12 / 25
  • 13. Multivariate Curve Ensembles A parameterized curve can be defined in terms of an independent parameter s as: c(s) = ˜x(s) c : D → R D ⊂ R, R ⊂ Rd Hurricane paths. Brain tractography data. Pathline ensemble in fluid simulation. Figure: A synthetic ensemble of multivariate curves in [Mirzargar et. al. 2014] 13 / 25
  • 14. Data Depth Formulation for Multivariate Curves (a) (b) Figure: Band formed by 3 multivariate curves [Lopez-Pintado et. al. 2014, Mirzargar et. al. 2014] Curve band formulation: g ⊂ B(ci1 , · · · , cij ) iff ∀x g(x) ∈ simplex ci1 (x), · · · , cij (x) (3) Curve band depth formulation: SBDj (g) = P g ⊂ B(fc1 , · · · , cij ) (4) 14 / 25
  • 15. Visualization of Data Depth for Curves Figure: Chinese Script replicated 100 times [Lopez-Pintado 2014]. Figure: Curve boxplot for hurricane path ensemble [Mirzargar et. al. 2014] 15 / 25
  • 16. Set / Isocontour Ensembles Given an ensemble of real valued functions f (x, y), the sublevel and superlevel sets for any particular isovalue. Isocontours of temperature field. Isocontours of pressure field in fluid dynamics simulations. Figure: A synthetic ensemble of contours in [Whitaker et. al. 2013] 16 / 25
  • 17. Data Depth Formulation for Sets Figure: Examples of set band [Whitaker et. al. 2013] Set band formulation: S ∈ sB(S1, . . . , Dj ) ↔ j k=1 Sk ⊂ S ⊂ j k=1 Sk (5) Set band depth formulation: sBDj (S) = P (S ⊂ sB(S1, . . . , Sj ) (6) 17 / 25
  • 18. Visualization of Data Depth for Sets (a) (b) Figure: Contour boxplot for an ensemble of isocontours of pressure field [Whitaker et. al. 2013] 18 / 25
  • 19. Paths (on a graph) Let G = {V , E, W }. A path p can be denoted as p : I → V where index set I = (1, . . . , m) Paths of packets in computer networks. Paths on transportation networks modelled as graphs. Figure: A synthetic ensemble of paths on a graph. 19 / 25
  • 20. Data Depth Formulation for Paths Figure: Illustration of band formed by 3 paths. Path band formulation: p ∈ B[Pj ] iff p(l) ∈ H[p1(l), . . . , pj (l)] ∀l ∈ I (7) Path band depth formulation: pBDj (p) = E [χ(p ∈ B(pj ))] (8) 20 / 25
  • 21. Visualization of Data Depth for Paths (a) (b) Figure: Path boxplots for paths on AS and road graphs. 21 / 25
  • 22. Outline 1 Data Depth Background What is Data Depth? Geometrical Data Depth General Properties of Data Depth 2 Generalized Notions of Data Depth Functions Multivariate Curves Sets Paths (on a graph) 3 Discussion Relaxed Formulations Advantages and Limitations of Data Depth 22 / 25
  • 23. Relaxed formulations 1 Modified Band Depth - Instead of an indicator function, measure object inside the band. 2 Subsets - Indicator function with a relaxed threshold. 23 / 25
  • 24. Advantages and Limitations For Combinatorial Data Depth Formulations for Complex Data Advantages No assumption required for the underlying distribution. Captures nonlocal relationships Robust. Limitations Computationally expensive for large ensembles. 24 / 25