SlideShare a Scribd company logo
Multidimensional Indexes

        Chapter 5
Outline
•   Applications
•   Hash-like structures
•   Tree-like structures
•   Bit-map indexes
Search Key
• Search key is (F1, F2, ..Fk)
  – Separated by special markers.
  – Example
     • If F1=abcd and F2=123, the search key is
       “abcd#123”
Applications: GIS
• Geographic Information Systems
   – Objects are in two dimensional space
   – The objects may be points or shapes
   – Objects may be houses, roads, bridges, pipelines and many
     other physical objects.
• Query types
   – Partial match queries
      • Specify the values for one or more dimensions and look for all
        points matching those values in those dimensions.
   – Range queries
      • Set of shapes within the range
   – Nearest neighbor queries
      • Closest point to a given point
   – Where-am-I queries
      • When you click a mouse, the system determines which of the
        displayed elements you were clicking.
Data Cubes
• Data exists in high dimensional space
• A chain store may record each sale made, including
   –   The day and time
   –   The store in which the sale was made
   –   The item purchased
   –   The color of the item
   –   The size of the item
• Attributes are seen as dimensions multidimensional
  space, data cube.
• Typical query
   – Give a class of pink shirts for each store and each month of
     1998.
Multidimensional queries in SQL
• Represent points as a relation Points(x,y)
• Query
  – Find the nearest point to (10, 20)
  – Compute the distance between (10,20) to
    every other point.
Rectangles
• Rectangles(id, x11, y11, xur, yur)
• If the query is
  – Find the rectangles enclosing the point
    (10,20)
• SELECT id
  FROM Rectangles
  WHERE x11 <= 10.0 AND y11 <= 20.0
  AND xur >=10.0 AND yur >=20.0;
Data Cube
• Fact table
  – Sales(day,store, item. color, size)
• Query
  – Summarize the sale of pink shirts by day and store
• SELECT day, store, COUNT(*) AS totalSales
  FROM Sales
  WHERE item=‘shirt’ AND color=‘pink’
  GROUP BY day, store;
Executing range queries in
        Conventional Indexes
• Motivation: Find records where
               DEPT = “Toy” AND SAL > 50k
• Strategy 1: Use “Toy” index and check salary
• Strategy II: Use “Toy” index and SAL index and
  intersect.
   – Complexity
      • Toy index: number of disk accesses = number of disk blocks
      • SAL index: number of disk accesses= number of records
• So conventional indexes of little help.
Executing nearest-neighbor queries using
              conventional indexes
• There might be no point in the selected
  range
  – Repeat the search by increasing the range
• The closest point within the range might
  not be the closest point overall.
  – There is one point closer from outside range.
Other Limitations of Conventional
                Indexes
• We can only keep the file sorted on only
  attribute.
• If the query is on multiple dimensions
  – We will end-up having one disk access for
    each record
• It becomes too expensive
Overview of Multidimensional Index
            Structures
• Hash-table-like approaches
• Tree-like approaches
• In both cases, we give-up some properties of
  single dimensional indexes
• Hash
  – We have to access multiple buckets
• Tree
  – Tree may not be balanced
  – There may not exist a correspondence between tree
    nodes and disk blocks
  – Information in the disk block is much smaller.
Hash like structures: GRID Files
• Each dimension, grid lines partition the
  space into stripes,
  – Points that fall on a grid line will be
    considered to belong to the stripe for which
    that grid line is lower boundary
  – The number of grid lines in each dimension
    may vary.
  – Space between grid lines in the same
    dimension may also vary
Grid Index
                           Key 2
                  X1 X2      ……            Xn
         V1
         V2
Key 1

             Vn

                    To records with key1=V3, key2=X2


                                                 14
CLAIM

• Can quickly find records with
  – key 1 = Vi ∧ Key 2 = Xj
  – key 1 = Vi
  – key 2 = Xj




                                  15
CLAIM

• Can quickly find records with
  – key 1 = Vi ∧ Key 2 = Xj
  – key 1 = Vi
  – key 2 = Xj

• And also ranges….
  – E.g., key 1 ≥ Vi ∧ key 2 < Xj


                                    16
But there is a catch with Grid Indexes!

  • How is Grid Index stored on disk?

           V1        V2        V3
Like
Array...
           X1
           X2
           X3
           X4

                   X1
                   X2
                   X3
                   X4

                            X1
                            X2
                            X3
                            X4
                                            17
But there is a catch with Grid Indexes!

  • How is Grid Index stored on disk?

           V1        V2        V3
Like
Array...
           X1
           X2
           X3
           X4

                   X1
                   X2
                   X3
                   X4

                            X1
                            X2
                            X3
                            X4
Problem:
• Need regularity so we can compute
     position of Vi,Xj entry
                                            18
Solution: Use Indirection

      X1 X2 X3              Buckets
                       --
 V1                    --
                       --
 V2                    --
                       --
                       --
 V3                             *Grid only
                       --
 V4                    --
                       --       contains
                                pointers to
                                buckets
             --
             --   --
                  --
 Buckets     --   --


                                              19
With indirection:

• Grid can be regular without wasting space
• We do have price of indirection




                                         20
Can also index grid on value ranges

Salary                    Grid

     0-20K         1
    20K-50K        2
      50K-         3
               8




                        1    2       3
Linear Scale           Toy Sales Personnel


                                         21
Grid files

 Good for multiple-key search
+
 Space, management overhead
 -   (nothing is free)
  Need partitioning ranges that evenly
 -  split keys




                                         22
Partitioned Hash Functions
Partitioned hash function

Idea:
         010110 1110010

Key1                         Key2
            h1     h2




                                    24
EX:
h1(toy)     =0        000
h1(sales)   =1        001
h1(art)     =1        010
  .                   011
  .
h2(10k)     =01       100
h2(20k)     =11       101
h2(30k)     =01       110
h2(40k)     =00       111
  .
  .

            Fred,toy,10k,Joe,sales,10k
 Insert     Sally,art,30k
                                             25
EX:
h1(toy)     =0        000
h1(sales)   =1        001       Fred
h1(art)     =1        010
  .                   011
  .
h2(10k)     =01       100
h2(20k)     =11       101    JoeSally
h2(30k)     =01       110
h2(40k)     =00       111
  .
  .

            Fred,toy,10k,Joe,sales,10k
 Insert     Sally,art,30k
                                             26
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales ∧ Sal=40k

                                           27
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales ∧ Sal=40k

                                           28
h1(toy)   =0          000     Fred
h1(sales) =1          001   JoeJan
h1(art)   =1          010     Mary
  .                   011
  .
h2(10k)   =01         100     Sally
h2(20k)   =11         101
h2(30k)   =01         110   TomBill
h2(40k)   =00         111     Andy
  .
  .
• Find Emp. with Sal=30k

                                          29
h1(toy)   =0          000     Fred
h1(sales) =1          001   JoeJan
h1(art)   =1          010     Mary
  .                   011
  .
h2(10k)   =01         100     Sally
h2(20k)   =11         101
h2(30k)   =01         110   TomBill
h2(40k)   =00         111     Andy
  .
  .
• Find Emp. with Sal=30k            look here

                                            30
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales

                                          31
h1(toy)   =0          000      Fred
h1(sales) =1          001    JoeJan
h1(art)   =1          010      Mary
  .                   011
  .
h2(10k)   =01         100      Sally
h2(20k)   =11         101
h2(30k)   =01         110 TomBill
h2(40k)   =00         111      Andy
  .
  .
• Find Emp. with Dept. = Sales    look here


                                              32
Tree-like Structures
•   Multiple-key indexes
•   Kd-trees
•   Quad trees
•   R-trees
Multiple-key indexes
• Several attributes representing
  dimensions of data points
• Multiple key index is an index of indexes in
  which the nodes at each level are indexes
  for one attribute.
Strategy:

• Multiple Key Index
One idea:
                       I2


               I1      I3




                            35
Example
            10k
            15k    Example
  Art       17k    Record
 Sales      21k
  Toy

Dept        12k      Name=Joe
Index       15k      DEPT=Sales
            15k      SAL=15k
            19k
          Salary
          Index

                              36
For which queries is this index good?


  Find RECs Dept = “Sales”   SAL=20k
  Find RECs Dept = “Sales”   SAL  20k
  Find RECs Dept = “Sales”
  Find RECs SAL = 20k




                                        37
KD-trees
• K dimensional tree is generalizing binary
  search tree into multi-dimensional data.
• A KD tree is a binary tree in which interior
  nodes have an associated attribute “a”
  and a value “v” that splits data into two
  parts.
• The attributes at different levels of a tree
  are different, and levels rotating among
  the attributes of all dimensions.
Interesting application:


 • Geographic Data
     y
                           DATA:
                           X1,Y1, Attributes
                 x         X2,Y2, Attributes


                             ...
                                            39
Queries:

• What city is at Xi,Yi?
• What is within 5 miles from Xi,Yi?
• Which is closest point to Xi,Yi?




                                         40
Example           i
                                      a
                          e   d
              h
                                  b
                  n       f
          l           o           c
          j               g
          k       m




                                      41
Example                     i
                                                  a
                                     e   d
                        h
                                              b
          10   20           n        f
                    l           o             c
                    j                g
                    k       m

                                10       20




                                                  42
Example                                                        a
                            40           i        e   d
                                     h
                            30                             b
          10      20                     n        f
                            20   l           o             c
                            10   j                g
    25    15 35        20                m
                                 k

                                             10       20




                                                               43
Example                                                                  a
                                      40           i        e   d
                                               h
                                      30                             b
                    10      20                     n        f
                                      20   l           o             c
                                      10   j                g
          25        15 35        20                m
                                           k

                                                       10       20
 5


     15        15



                                                                         44
Example                                                                         a
                                            40            i        e   d
                                                      h
                                            30                              b
                         10      20                       n        f
                                            20    l           o             c
                                            10    j                g
          25             15 35         20                 m
                                                  k

                                                              10       20
 5        h i        g    f      d e    c   a b


     15         15


j k       l      m        n o
                                                                                45
Example                                                                         a
                                             40           i        e   d
                                                      h
                                             30                             b
                         10      20                       n        f
                                             20   l           o             c
                                             10    j               g
          25             15 35         20                 m
                                                   k

                                                              10       20
 5        h i        g    f      d e    c   a b


     15         15                          • Search points near f
                                            • Search points near b
j k       l      m        n o
                                                                                46
Queries

•   Find points with Yi  20
•   Find points with Xi  5
•   Find points “close” to i = 12,38
•   Find points “close” to b = 7,24




                                         47
Quad trees
• Divides multidimensional space into
  quadrants and divides the quadrants
  same way if they have too many points.
• If the number of points in a square
  – fits in a block, it is a leaf node
  – no longer fits in a block, it becomes an interior
    node, four quadrants are its children.
Quad tree pictures
R-tree
• Captures the spirit of B-tree for multidimensional
  data.
• Represents a collection of regions by grouping
  them into a hierarchy of larger regions.
• Data is divided into regions.
• Interior node is corresponds to interior region
  – region can be of any shape
     • Rectangular is popular
  – Children corresponds to sub-regions.
R-tree
Bitmap indexes
• Assume that records have permanent numbers
• A bit-map index is a collection of bit vectors of length n,
  one for each value may appear in the field F.
• The vector for value v has 1 in position i if the i’th record
  has v in field F, and it has 0 there if not.
• Example for F and G fields:
   – (30, foo), (30,bar), (40, baz), (50, foo), (40, bar), (30, baz)
   – Bit index for F, each of 6 bits. For 30, it is 110001, for 40, it is
     001010, and for 50, it is 000100.
   – Bit index for G also have three vectors. For foo it is, 100100, for
     bar it is 010010, and for baz it is 001001.
Bit map indexes: Partial match
• Bit maps allow answering of partial match
  queries quickly and efficiently.
• Example:
   –   Movie(title, year, length, studioName)
   –   SELECT title
   –   FROM Movie
   –   WHERE studioName= ‘Disney’ and Year=1965;
• If we have bitmap for studioName and year, then
  intersection or AND operation will give the result.
• Bit vectors do not occupy much space.
Bitmap indexes: range queries
•   Example: consider the gold jewelry data of twelve points
    – 1 (25,60), 2(45,60), 3(50,75), 4(50,100), 5(50,120), 6(70, 110),
      7(85,140), 8(30,260), 9(25,400), 10(45,350), 11(50,275),
      12(60,260)
• Age has seven different values
    – 25(100000001000), 30(000000010000), 45(010000000100),
      50(001110000010), 60(000000000001), 70(000001000000)
    – 85(000000100000)
• Salary has 10 different values
    –   60(000000000000), 75(001000000000), 100(000100000000)
    –   110(000001000000), 120(000010000000), 140(000000100000)
    –   260(000000010001), 275(000000000010), 350(000000000100)
    –   400(000000001000),
Example continued
• Find the jewelry buyers with an age range 45-55 and
  salary in the range 100-200
• Find the bit vectors of for the age values in the range
  and take OR
   – 010000000100 (for 45) and 001110000010 (for 50)
   – Result: 011110000110
• Find the bit vectors of salaries between 100 and 200
  thousand.
   – There are four: 100,110,120, and 140, their bitwaise OR is
     000111100000
• Take AND of both bit vectors
   – 000110000000
   – Find two records (50,100) and (50,120) are in the range.
Compressed bitmaps
• If number of different values is large, then
  number of 1 is rare.
• Run-length coding is used
   – Sequence of 0’s followed by 1.
   – Example: 000101 is two runs, 3 and 1. the binary
     representation is 11 and 1. So it is decoded as 111.
• To save space, the bitmap indexes tend to
  consist of vectors with very few 1’s are
  compressed using run-length coding.
Finding bit vectors
• Use any index technique to find the
  values.
• From the values to bit vectors.
• B-tree is a good choice.

More Related Content

PPTX
MongoDB presentation
PPT
Apache Hive - Introduction
PPTX
Distributed system architecture
PPTX
Fragmentation
PPT
Pagerank Algorithm Explained
PPTX
The Basics of MongoDB
PPTX
Osi reference model
PPTX
R-Trees and Geospatial Data Structures
MongoDB presentation
Apache Hive - Introduction
Distributed system architecture
Fragmentation
Pagerank Algorithm Explained
The Basics of MongoDB
Osi reference model
R-Trees and Geospatial Data Structures

What's hot (20)

PDF
2.6 ethernet ieee 802.3
PPTX
Routing protocols
PPT
PPT
Hive(ppt)
PPTX
Context free grammar
PPTX
COMPUTER NETWORK - TRANSMISSION MODES.pptx
PPT
13. Query Processing in DBMS
PPTX
Frame relay
PPT
Ospf
PDF
IP Datagram Structure
PPTX
PPP (Point to Point Protocol)
PPTX
BLE Talk
PPTX
Apache hive introduction
PPT
Introduction to MongoDB
PPT
Lexical Analysis
PDF
Network layer logical addressing
PPTX
Dynamic routing protocols (CCNA)
PPSX
Token ring
PPT
Ieee 802 standard
PPTX
OPEN SHORTEST PATH FIRST (OSPF)
2.6 ethernet ieee 802.3
Routing protocols
Hive(ppt)
Context free grammar
COMPUTER NETWORK - TRANSMISSION MODES.pptx
13. Query Processing in DBMS
Frame relay
Ospf
IP Datagram Structure
PPP (Point to Point Protocol)
BLE Talk
Apache hive introduction
Introduction to MongoDB
Lexical Analysis
Network layer logical addressing
Dynamic routing protocols (CCNA)
Token ring
Ieee 802 standard
OPEN SHORTEST PATH FIRST (OSPF)
Ad

Viewers also liked (12)

PDF
Searching in high dimensional spaces index structures for improving the perfo...
PPTX
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
PPT
Clustering &amp; classification
PDF
Multidimensional Data in the VO
PPTX
Agile Testing - LAST Conference 2015
PDF
A survey on massively Parallelism for indexing multidimensional datasets on t...
PDF
Project - Deep Locality Sensitive Hashing
PDF
Multidimensional Analysis of Complex Networks
PPTX
pratik meshram-Unit 5 (contemporary mkt r sch)
PDF
Visualising Multi Dimensional Data
PPTX
K-means Clustering with Scikit-Learn
PPT
12. Indexing and Hashing in DBMS
Searching in high dimensional spaces index structures for improving the perfo...
Indexing 3-dimensional trajectories: Apache Spark and Cassandra integration
Clustering &amp; classification
Multidimensional Data in the VO
Agile Testing - LAST Conference 2015
A survey on massively Parallelism for indexing multidimensional datasets on t...
Project - Deep Locality Sensitive Hashing
Multidimensional Analysis of Complex Networks
pratik meshram-Unit 5 (contemporary mkt r sch)
Visualising Multi Dimensional Data
K-means Clustering with Scikit-Learn
12. Indexing and Hashing in DBMS
Ad

Similar to Multidimensional Indexing (19)

PDF
Beyond tf idf why, what & how
PDF
On Beyond (PostgreSQL) Data Types
PPTX
Graphing Exponentials
PPT
Class10
PPT
Binary codes
PDF
Numerical Linear Algebra for Data and Link Analysis.
PPT
c07hash1234562355_erfggfdssswerdddss.ppt
PPT
Geometric transformation cg
PPT
basic_gates.ppt
PDF
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
PPTX
DOCX
There are two types of ciphers - Block and Stream. Block is used to .docx
PPT
Booth Multiplier
PPTX
Computer Science 33 - Week 01 - Discussion 1.pptx
PPT
04-logic-gates (1).ppt
PPT
04-logic-gates.ppt
PDF
Lecture.1
PPTX
Jacob's and Vlad's D.E.V. Project - 2012
PPTX
Paris data-geeks-2013-03-28
Beyond tf idf why, what & how
On Beyond (PostgreSQL) Data Types
Graphing Exponentials
Class10
Binary codes
Numerical Linear Algebra for Data and Link Analysis.
c07hash1234562355_erfggfdssswerdddss.ppt
Geometric transformation cg
basic_gates.ppt
Stefan Kanev: Clojure, ClojureScript and Why They're Awesome at I T.A.K.E. Un...
There are two types of ciphers - Block and Stream. Block is used to .docx
Booth Multiplier
Computer Science 33 - Week 01 - Discussion 1.pptx
04-logic-gates (1).ppt
04-logic-gates.ppt
Lecture.1
Jacob's and Vlad's D.E.V. Project - 2012
Paris data-geeks-2013-03-28

More from Digvijay Singh (14)

PPT
Week3 applications
PPT
PPT
PPT
Week1.2 intro
PPT
Networks
PPT
Uncertainty
PPT
Overfitting and-tbl
PPTX
Ngrams smoothing
PPT
Query execution
PPT
Query compiler
PPT
Machine learning
PPT
Hmm viterbi
PPT
3 fol examples v2
PPT
Bayesnetwork
Week3 applications
Week1.2 intro
Networks
Uncertainty
Overfitting and-tbl
Ngrams smoothing
Query execution
Query compiler
Machine learning
Hmm viterbi
3 fol examples v2
Bayesnetwork

Recently uploaded (20)

PDF
Computing-Curriculum for Schools in Ghana
PDF
RMMM.pdf make it easy to upload and study
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Trump Administration's workforce development strategy
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Cell Types and Its function , kingdom of life
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Cell Structure & Organelles in detailed.
Computing-Curriculum for Schools in Ghana
RMMM.pdf make it easy to upload and study
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Module 4: Burden of Disease Tutorial Slides S2 2025
VCE English Exam - Section C Student Revision Booklet
Trump Administration's workforce development strategy
01-Introduction-to-Information-Management.pdf
Orientation - ARALprogram of Deped to the Parents.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
202450812 BayCHI UCSC-SV 20250812 v17.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Cell Types and Its function , kingdom of life
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Cell Structure & Organelles in detailed.

Multidimensional Indexing

  • 2. Outline • Applications • Hash-like structures • Tree-like structures • Bit-map indexes
  • 3. Search Key • Search key is (F1, F2, ..Fk) – Separated by special markers. – Example • If F1=abcd and F2=123, the search key is “abcd#123”
  • 4. Applications: GIS • Geographic Information Systems – Objects are in two dimensional space – The objects may be points or shapes – Objects may be houses, roads, bridges, pipelines and many other physical objects. • Query types – Partial match queries • Specify the values for one or more dimensions and look for all points matching those values in those dimensions. – Range queries • Set of shapes within the range – Nearest neighbor queries • Closest point to a given point – Where-am-I queries • When you click a mouse, the system determines which of the displayed elements you were clicking.
  • 5. Data Cubes • Data exists in high dimensional space • A chain store may record each sale made, including – The day and time – The store in which the sale was made – The item purchased – The color of the item – The size of the item • Attributes are seen as dimensions multidimensional space, data cube. • Typical query – Give a class of pink shirts for each store and each month of 1998.
  • 6. Multidimensional queries in SQL • Represent points as a relation Points(x,y) • Query – Find the nearest point to (10, 20) – Compute the distance between (10,20) to every other point.
  • 7. Rectangles • Rectangles(id, x11, y11, xur, yur) • If the query is – Find the rectangles enclosing the point (10,20) • SELECT id FROM Rectangles WHERE x11 <= 10.0 AND y11 <= 20.0 AND xur >=10.0 AND yur >=20.0;
  • 8. Data Cube • Fact table – Sales(day,store, item. color, size) • Query – Summarize the sale of pink shirts by day and store • SELECT day, store, COUNT(*) AS totalSales FROM Sales WHERE item=‘shirt’ AND color=‘pink’ GROUP BY day, store;
  • 9. Executing range queries in Conventional Indexes • Motivation: Find records where DEPT = “Toy” AND SAL > 50k • Strategy 1: Use “Toy” index and check salary • Strategy II: Use “Toy” index and SAL index and intersect. – Complexity • Toy index: number of disk accesses = number of disk blocks • SAL index: number of disk accesses= number of records • So conventional indexes of little help.
  • 10. Executing nearest-neighbor queries using conventional indexes • There might be no point in the selected range – Repeat the search by increasing the range • The closest point within the range might not be the closest point overall. – There is one point closer from outside range.
  • 11. Other Limitations of Conventional Indexes • We can only keep the file sorted on only attribute. • If the query is on multiple dimensions – We will end-up having one disk access for each record • It becomes too expensive
  • 12. Overview of Multidimensional Index Structures • Hash-table-like approaches • Tree-like approaches • In both cases, we give-up some properties of single dimensional indexes • Hash – We have to access multiple buckets • Tree – Tree may not be balanced – There may not exist a correspondence between tree nodes and disk blocks – Information in the disk block is much smaller.
  • 13. Hash like structures: GRID Files • Each dimension, grid lines partition the space into stripes, – Points that fall on a grid line will be considered to belong to the stripe for which that grid line is lower boundary – The number of grid lines in each dimension may vary. – Space between grid lines in the same dimension may also vary
  • 14. Grid Index Key 2 X1 X2 …… Xn V1 V2 Key 1 Vn To records with key1=V3, key2=X2 14
  • 15. CLAIM • Can quickly find records with – key 1 = Vi ∧ Key 2 = Xj – key 1 = Vi – key 2 = Xj 15
  • 16. CLAIM • Can quickly find records with – key 1 = Vi ∧ Key 2 = Xj – key 1 = Vi – key 2 = Xj • And also ranges…. – E.g., key 1 ≥ Vi ∧ key 2 < Xj 16
  • 17. But there is a catch with Grid Indexes! • How is Grid Index stored on disk? V1 V2 V3 Like Array... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 17
  • 18. But there is a catch with Grid Indexes! • How is Grid Index stored on disk? V1 V2 V3 Like Array... X1 X2 X3 X4 X1 X2 X3 X4 X1 X2 X3 X4 Problem: • Need regularity so we can compute position of Vi,Xj entry 18
  • 19. Solution: Use Indirection X1 X2 X3 Buckets -- V1 -- -- V2 -- -- -- V3 *Grid only -- V4 -- -- contains pointers to buckets -- -- -- -- Buckets -- -- 19
  • 20. With indirection: • Grid can be regular without wasting space • We do have price of indirection 20
  • 21. Can also index grid on value ranges Salary Grid 0-20K 1 20K-50K 2 50K- 3 8 1 2 3 Linear Scale Toy Sales Personnel 21
  • 22. Grid files Good for multiple-key search + Space, management overhead - (nothing is free) Need partitioning ranges that evenly - split keys 22
  • 24. Partitioned hash function Idea: 010110 1110010 Key1 Key2 h1 h2 24
  • 25. EX: h1(toy) =0 000 h1(sales) =1 001 h1(art) =1 010 . 011 . h2(10k) =01 100 h2(20k) =11 101 h2(30k) =01 110 h2(40k) =00 111 . . Fred,toy,10k,Joe,sales,10k Insert Sally,art,30k 25
  • 26. EX: h1(toy) =0 000 h1(sales) =1 001 Fred h1(art) =1 010 . 011 . h2(10k) =01 100 h2(20k) =11 101 JoeSally h2(30k) =01 110 h2(40k) =00 111 . . Fred,toy,10k,Joe,sales,10k Insert Sally,art,30k 26
  • 27. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales ∧ Sal=40k 27
  • 28. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales ∧ Sal=40k 28
  • 29. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Sal=30k 29
  • 30. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Sal=30k look here 30
  • 31. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales 31
  • 32. h1(toy) =0 000 Fred h1(sales) =1 001 JoeJan h1(art) =1 010 Mary . 011 . h2(10k) =01 100 Sally h2(20k) =11 101 h2(30k) =01 110 TomBill h2(40k) =00 111 Andy . . • Find Emp. with Dept. = Sales look here 32
  • 33. Tree-like Structures • Multiple-key indexes • Kd-trees • Quad trees • R-trees
  • 34. Multiple-key indexes • Several attributes representing dimensions of data points • Multiple key index is an index of indexes in which the nodes at each level are indexes for one attribute.
  • 35. Strategy: • Multiple Key Index One idea: I2 I1 I3 35
  • 36. Example 10k 15k Example Art 17k Record Sales 21k Toy Dept 12k Name=Joe Index 15k DEPT=Sales 15k SAL=15k 19k Salary Index 36
  • 37. For which queries is this index good? Find RECs Dept = “Sales” SAL=20k Find RECs Dept = “Sales” SAL 20k Find RECs Dept = “Sales” Find RECs SAL = 20k 37
  • 38. KD-trees • K dimensional tree is generalizing binary search tree into multi-dimensional data. • A KD tree is a binary tree in which interior nodes have an associated attribute “a” and a value “v” that splits data into two parts. • The attributes at different levels of a tree are different, and levels rotating among the attributes of all dimensions.
  • 39. Interesting application: • Geographic Data y DATA: X1,Y1, Attributes x X2,Y2, Attributes ... 39
  • 40. Queries: • What city is at Xi,Yi? • What is within 5 miles from Xi,Yi? • Which is closest point to Xi,Yi? 40
  • 41. Example i a e d h b n f l o c j g k m 41
  • 42. Example i a e d h b 10 20 n f l o c j g k m 10 20 42
  • 43. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 43
  • 44. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 15 15 44
  • 45. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 h i g f d e c a b 15 15 j k l m n o 45
  • 46. Example a 40 i e d h 30 b 10 20 n f 20 l o c 10 j g 25 15 35 20 m k 10 20 5 h i g f d e c a b 15 15 • Search points near f • Search points near b j k l m n o 46
  • 47. Queries • Find points with Yi 20 • Find points with Xi 5 • Find points “close” to i = 12,38 • Find points “close” to b = 7,24 47
  • 48. Quad trees • Divides multidimensional space into quadrants and divides the quadrants same way if they have too many points. • If the number of points in a square – fits in a block, it is a leaf node – no longer fits in a block, it becomes an interior node, four quadrants are its children.
  • 50. R-tree • Captures the spirit of B-tree for multidimensional data. • Represents a collection of regions by grouping them into a hierarchy of larger regions. • Data is divided into regions. • Interior node is corresponds to interior region – region can be of any shape • Rectangular is popular – Children corresponds to sub-regions.
  • 52. Bitmap indexes • Assume that records have permanent numbers • A bit-map index is a collection of bit vectors of length n, one for each value may appear in the field F. • The vector for value v has 1 in position i if the i’th record has v in field F, and it has 0 there if not. • Example for F and G fields: – (30, foo), (30,bar), (40, baz), (50, foo), (40, bar), (30, baz) – Bit index for F, each of 6 bits. For 30, it is 110001, for 40, it is 001010, and for 50, it is 000100. – Bit index for G also have three vectors. For foo it is, 100100, for bar it is 010010, and for baz it is 001001.
  • 53. Bit map indexes: Partial match • Bit maps allow answering of partial match queries quickly and efficiently. • Example: – Movie(title, year, length, studioName) – SELECT title – FROM Movie – WHERE studioName= ‘Disney’ and Year=1965; • If we have bitmap for studioName and year, then intersection or AND operation will give the result. • Bit vectors do not occupy much space.
  • 54. Bitmap indexes: range queries • Example: consider the gold jewelry data of twelve points – 1 (25,60), 2(45,60), 3(50,75), 4(50,100), 5(50,120), 6(70, 110), 7(85,140), 8(30,260), 9(25,400), 10(45,350), 11(50,275), 12(60,260) • Age has seven different values – 25(100000001000), 30(000000010000), 45(010000000100), 50(001110000010), 60(000000000001), 70(000001000000) – 85(000000100000) • Salary has 10 different values – 60(000000000000), 75(001000000000), 100(000100000000) – 110(000001000000), 120(000010000000), 140(000000100000) – 260(000000010001), 275(000000000010), 350(000000000100) – 400(000000001000),
  • 55. Example continued • Find the jewelry buyers with an age range 45-55 and salary in the range 100-200 • Find the bit vectors of for the age values in the range and take OR – 010000000100 (for 45) and 001110000010 (for 50) – Result: 011110000110 • Find the bit vectors of salaries between 100 and 200 thousand. – There are four: 100,110,120, and 140, their bitwaise OR is 000111100000 • Take AND of both bit vectors – 000110000000 – Find two records (50,100) and (50,120) are in the range.
  • 56. Compressed bitmaps • If number of different values is large, then number of 1 is rare. • Run-length coding is used – Sequence of 0’s followed by 1. – Example: 000101 is two runs, 3 and 1. the binary representation is 11 and 1. So it is decoded as 111. • To save space, the bitmap indexes tend to consist of vectors with very few 1’s are compressed using run-length coding.
  • 57. Finding bit vectors • Use any index technique to find the values. • From the values to bit vectors. • B-tree is a good choice.