Distance-based Indexing for metric space & almost-metric space Donghui Zhang Northeastern University
Problem Statement Given a set  S  of objects and a metric distance function  d (). The similarity search problem is defined as: for an arbitrary object  q  and a threshold   , find {  o  |  o  S      d ( o ,  q )<    } Solution without index: for every  o  S , compute  d ( q,o ). Not efficient!
Metric Function d ( x,x )=0; d ( x,y )>0, where  x ≠ y ; d ( x,y )= d ( y,x ); d ( x,y )  ≤  d ( x,z )+ d ( y,z ).
Spatial-Index Approach If every object can be mapped to a location in space (e.g. 2-D point), there are existing solutions. R-tree, Quad-tree, X-tree, … Idea: break space hierarchically into partitions and store objects that are close to each other in the same partition; at query time, prune whole partitions if possible.
Spatial Indexes Do not Apply In our problem, objects can be arbitrary and we only know the distance function. E.g. objects can be pictures, dogs, and so on. How to map a dog as a multi-dim point? Not clear. But suppose we got the “magical” distance function.
VP-tree vantage point tree, by Peter N. Yianilos,  “Data Structures and Algorithms for Nearest Neighbor Search in General metric Spaces”, Proc. ACM-SIAM Symposium on Discrete Algorithms, 1993. Idea: build a binary search tree, where each node corresponds to an object; the root is randomly picked; the n/2 objects that are close to it are in the left subtree.
An Example S ={ o 1 ,…, o 10 }. Randomly pick  o 1  as root. Compute the distance between  o 1  and  o i , sort in increasing order of distance: build tree  recursively. 401 300 111 102 96 34 18 6 5 o 4 o 5 o 8 o 2 o 10 o 9 o 6 o 7 o 3 o 1 o 3  , o 7  , o 6  , o 9 o 10  , o 2  , o 8  , o 5 , o 4 34 96
Query Processing Given object  q , compute  d ( q,root ). Intuitively, if it’s small, search the left tree; otherwise, search the right tree. Let  maxDL =max{  d ( root, o i )| o i    left tree }, (stored in the index) Under what circumstance can we prune the left sub-tree?
To Prune the Left Sub-Tree… We need:     o i    left tree,  d ( q,o i )  ≥   .  We know:  d ( q,o i )+ d ( o i ,root )  ≥  d ( q,root ), or d ( q,o i )  ≥  d ( q,root ) -   d ( o i ,root ), which implies: d ( q,o i )  ≥  d ( q,root ) –   maxDL. To guarantee (1), it’s sufficient to have: d ( q,root ) –   maxDL  ≥   . Summary: given  q , compare with tree root. If  d ( q,root ) is so large that (3) is true, we know (1) is true and we can prune the left sub-tree.
To Prune the Right Sub-Tree… Similarly, we define  minDR =min{  d ( root, o i )| o i    right tree }. Given  q , compare with tree root. If  d ( q,root ) is so small that  minDR  -  d ( q,root )  ≥    is true, we can prune the right sub-tree. Note: these prunings are done at each level of the tree.
Can we always prune? No.  If  d ( q,root ) –   maxDL  <   , cannot prune left; If  minDR  -  d ( q,root )  <   , cannot prune right; Combine together:   If  minDR  -     <  d ( q,root ) <  maxDL +    , we have to examine both sub-trees.
Almost Metric Almost Metric  was introduced in the paper “Distance Based Indexing for String Proximity Search”, ICDE’03. It is similar to metric, with the difference that the condition  d ( x,y )  ≤  d ( x,z )+ d ( y,z ) is changed to  d ( x,y )  ≤  f  * (  d ( x,z )+ d ( y,z ) ) for some constant  f . Can the VP-tree be used in an almost metric space?
A thought on  f Must be:  f   ≥ 1. Why? d ( x,y )  ≤  f  * (  d ( x,z )+ d ( y,z ) ) d ( x,y )  ≤  f  * (  d ( x,y ) )+ d ( y,y ) ) d ( x,y )  ≤  f  *  d ( x,y ) f   ≥ 1 let y=z since d(y,y) =0 since d(x,y) ≥0
To Prune the Right Sub-Tree… We need:     o i    right tree,  d ( q,o i )  ≥   .  We know:  d ( o i  ,root )  ≤   f  * ( d ( q,o i )+ d ( q,root )), or d ( q,o i )  ≥  d ( o i  ,root ) -   d ( q,root ), which implies: d ( q,o i )  ≥  minDR  -  d ( q,root ) . To guarantee (1), it’s sufficient to have:   minDR  -  d ( q,root )   ≥   . Summary: given  q , compare with tree root. If  d ( q,root ) is so small that (3) is true, we know (1) is true and we can prune the right sub-tree.

More Related Content

PDF
Common fixed point theorems with continuously subcompatible mappings in fuzz...
PDF
The existence of common fixed point theorems of generalized contractive mappi...
PDF
Some fixed point theorems in fuzzy mappings
PDF
2.1 Union, intersection and complement
PDF
Some properties of two-fuzzy Nor med spaces
PDF
C1061417
PDF
Fixed Point Theorems for Weak K-Quasi Contractions on a Generalized Metric Sp...
PDF
Analysis Solutions CVI
Common fixed point theorems with continuously subcompatible mappings in fuzz...
The existence of common fixed point theorems of generalized contractive mappi...
Some fixed point theorems in fuzzy mappings
2.1 Union, intersection and complement
Some properties of two-fuzzy Nor med spaces
C1061417
Fixed Point Theorems for Weak K-Quasi Contractions on a Generalized Metric Sp...
Analysis Solutions CVI

What's hot (18)

PDF
3.2 Power sets
PDF
2.4 Symmetric difference
PPT
Chapter 9 ds
PDF
18 Machine Learning Radial Basis Function Networks Forward Heuristics
PPT
125 7.7
PDF
(C f)- weak contraction in cone metric spaces
PDF
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
PDF
Lesson 27: Integration by Substitution (Section 041 slides)
PDF
Techniques of differentiation further
PDF
Double integration
PDF
RESIDUAL QUOTIENT AND ANNIHILATOR OF INTUITIONISTIC FUZZY SETS OF RING AND MO...
PDF
Dynamic Programming Over Graphs of Bounded Treewidth
PPTX
Session11 single dimarrays
PDF
On generalized dislocated quasi metrics
PDF
11.on generalized dislocated quasi metrics
PDF
2.3 Set difference
PDF
3.7 Indexed families of sets
3.2 Power sets
2.4 Symmetric difference
Chapter 9 ds
18 Machine Learning Radial Basis Function Networks Forward Heuristics
125 7.7
(C f)- weak contraction in cone metric spaces
8 fixed point theorem in complete fuzzy metric space 8 megha shrivastava
Lesson 27: Integration by Substitution (Section 041 slides)
Techniques of differentiation further
Double integration
RESIDUAL QUOTIENT AND ANNIHILATOR OF INTUITIONISTIC FUZZY SETS OF RING AND MO...
Dynamic Programming Over Graphs of Bounded Treewidth
Session11 single dimarrays
On generalized dislocated quasi metrics
11.on generalized dislocated quasi metrics
2.3 Set difference
3.7 Indexed families of sets
Ad

Similar to Distance Based Indexing (20)

PDF
kdtrees.pdf
PDF
Rank based similarity search reducing the dimensional dependence
PDF
M tree
PDF
Trees Information
PPT
2.4 mst prim’s algorithm
PPTX
Parallel Algorithms for Geometric Graph Problems (at Stanford)
PPT
PAM.ppt
PDF
Enhanced Methodology for supporting approximate string search in Geospatial ...
PPT
Indexing Data Structure
PPTX
An optimal and progressive algorithm for skyline queries slide
PDF
Searching in metric spaces
PPTX
Datastructure tree
PPT
lec41.ppt
PPTX
Balanced Tree (AVL Tree & Red-Black Tree)
PPTX
Balanced Tree(AVL Tree,Red Black Tree)
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
PDF
Approximate nearest neighbor methods and vector models – NYC ML meetup
PPTX
Binary Search Tree
kdtrees.pdf
Rank based similarity search reducing the dimensional dependence
M tree
Trees Information
2.4 mst prim’s algorithm
Parallel Algorithms for Geometric Graph Problems (at Stanford)
PAM.ppt
Enhanced Methodology for supporting approximate string search in Geospatial ...
Indexing Data Structure
An optimal and progressive algorithm for skyline queries slide
Searching in metric spaces
Datastructure tree
lec41.ppt
Balanced Tree (AVL Tree & Red-Black Tree)
Balanced Tree(AVL Tree,Red Black Tree)
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate nearest neighbor methods and vector models – NYC ML meetup
Binary Search Tree
Ad

Recently uploaded (20)

PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
PDF
CloudStack 4.21: First Look Webinar slides
PDF
Improvisation in detection of pomegranate leaf disease using transfer learni...
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PDF
Five Habits of High-Impact Board Members
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPTX
Chapter 5: Probability Theory and Statistics
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPTX
2018-HIPAA-Renewal-Training for executives
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
STKI Israel Market Study 2025 version august
PDF
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPT
Geologic Time for studying geology for geologist
PPTX
The various Industrial Revolutions .pptx
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
OpenACC and Open Hackathons Monthly Highlights July 2025
CloudStack 4.21: First Look Webinar slides
Improvisation in detection of pomegranate leaf disease using transfer learni...
Enhancing plagiarism detection using data pre-processing and machine learning...
Five Habits of High-Impact Board Members
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Chapter 5: Probability Theory and Statistics
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
2018-HIPAA-Renewal-Training for executives
Final SEM Unit 1 for mit wpu at pune .pptx
STKI Israel Market Study 2025 version august
“A New Era of 3D Sensing: Transforming Industries and Creating Opportunities,...
UiPath Agentic Automation session 1: RPA to Agents
Geologic Time for studying geology for geologist
The various Industrial Revolutions .pptx
NewMind AI Weekly Chronicles – August ’25 Week III
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
Comparative analysis of machine learning models for fake news detection in so...
A contest of sentiment analysis: k-nearest neighbor versus neural network

Distance Based Indexing

  • 1. Distance-based Indexing for metric space & almost-metric space Donghui Zhang Northeastern University
  • 2. Problem Statement Given a set S of objects and a metric distance function d (). The similarity search problem is defined as: for an arbitrary object q and a threshold  , find { o | o  S  d ( o , q )<  } Solution without index: for every o  S , compute d ( q,o ). Not efficient!
  • 3. Metric Function d ( x,x )=0; d ( x,y )>0, where x ≠ y ; d ( x,y )= d ( y,x ); d ( x,y ) ≤ d ( x,z )+ d ( y,z ).
  • 4. Spatial-Index Approach If every object can be mapped to a location in space (e.g. 2-D point), there are existing solutions. R-tree, Quad-tree, X-tree, … Idea: break space hierarchically into partitions and store objects that are close to each other in the same partition; at query time, prune whole partitions if possible.
  • 5. Spatial Indexes Do not Apply In our problem, objects can be arbitrary and we only know the distance function. E.g. objects can be pictures, dogs, and so on. How to map a dog as a multi-dim point? Not clear. But suppose we got the “magical” distance function.
  • 6. VP-tree vantage point tree, by Peter N. Yianilos, “Data Structures and Algorithms for Nearest Neighbor Search in General metric Spaces”, Proc. ACM-SIAM Symposium on Discrete Algorithms, 1993. Idea: build a binary search tree, where each node corresponds to an object; the root is randomly picked; the n/2 objects that are close to it are in the left subtree.
  • 7. An Example S ={ o 1 ,…, o 10 }. Randomly pick o 1 as root. Compute the distance between o 1 and o i , sort in increasing order of distance: build tree recursively. 401 300 111 102 96 34 18 6 5 o 4 o 5 o 8 o 2 o 10 o 9 o 6 o 7 o 3 o 1 o 3 , o 7 , o 6 , o 9 o 10 , o 2 , o 8 , o 5 , o 4 34 96
  • 8. Query Processing Given object q , compute d ( q,root ). Intuitively, if it’s small, search the left tree; otherwise, search the right tree. Let maxDL =max{ d ( root, o i )| o i  left tree }, (stored in the index) Under what circumstance can we prune the left sub-tree?
  • 9. To Prune the Left Sub-Tree… We need:  o i  left tree, d ( q,o i ) ≥  . We know: d ( q,o i )+ d ( o i ,root ) ≥ d ( q,root ), or d ( q,o i ) ≥ d ( q,root ) - d ( o i ,root ), which implies: d ( q,o i ) ≥ d ( q,root ) – maxDL. To guarantee (1), it’s sufficient to have: d ( q,root ) – maxDL ≥  . Summary: given q , compare with tree root. If d ( q,root ) is so large that (3) is true, we know (1) is true and we can prune the left sub-tree.
  • 10. To Prune the Right Sub-Tree… Similarly, we define minDR =min{ d ( root, o i )| o i  right tree }. Given q , compare with tree root. If d ( q,root ) is so small that minDR - d ( q,root ) ≥  is true, we can prune the right sub-tree. Note: these prunings are done at each level of the tree.
  • 11. Can we always prune? No. If d ( q,root ) – maxDL <  , cannot prune left; If minDR - d ( q,root ) <  , cannot prune right; Combine together: If minDR -  < d ( q,root ) < maxDL +  , we have to examine both sub-trees.
  • 12. Almost Metric Almost Metric was introduced in the paper “Distance Based Indexing for String Proximity Search”, ICDE’03. It is similar to metric, with the difference that the condition d ( x,y ) ≤ d ( x,z )+ d ( y,z ) is changed to d ( x,y ) ≤ f * ( d ( x,z )+ d ( y,z ) ) for some constant f . Can the VP-tree be used in an almost metric space?
  • 13. A thought on f Must be: f ≥ 1. Why? d ( x,y ) ≤ f * ( d ( x,z )+ d ( y,z ) ) d ( x,y ) ≤ f * ( d ( x,y ) )+ d ( y,y ) ) d ( x,y ) ≤ f * d ( x,y ) f ≥ 1 let y=z since d(y,y) =0 since d(x,y) ≥0
  • 14. To Prune the Right Sub-Tree… We need:  o i  right tree, d ( q,o i ) ≥  . We know: d ( o i ,root ) ≤ f * ( d ( q,o i )+ d ( q,root )), or d ( q,o i ) ≥ d ( o i ,root ) - d ( q,root ), which implies: d ( q,o i ) ≥ minDR - d ( q,root ) . To guarantee (1), it’s sufficient to have: minDR - d ( q,root ) ≥  . Summary: given q , compare with tree root. If d ( q,root ) is so small that (3) is true, we know (1) is true and we can prune the right sub-tree.