SlideShare a Scribd company logo
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Hash-Based Indexes
Chapter 11
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Introduction
 Hash-based indexes are best for equality selections.
Cannot support range searches.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Static Hashing
 # primary pages fixed, allocated sequentially,
never de-allocated; overflow pages if needed.
 h(k) mod M = bucket to which data entry with
key k belongs. (M = # of buckets)
h(key) mod M
h
key
Primary bucket pages Overflow pages
2
0
M-1
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Static Hashing (Contd.)
 Buckets contain data entries.
 Hash fn works on search key field of record r. Must
distribute values over range 0 ... M-1.
 h(key) = (a * key + b) usually works well.
 a and b are constants; lots known about how to tune h.
 Long overflow chains can develop and degrade
performance.
 Extendible and Linear Hashing: Dynamic techniques to fix
this problem.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Extendible Hashing
 Situation: Bucket (primary page) becomes full. Why
not re-organize file by doubling # of buckets?
 Reading and writing all pages is expensive!
 Idea: Use directory of pointers to buckets, double # of
buckets by doubling the directory, splitting just the bucket
that overflowed!
 Directory much smaller than file, so doubling it is much
cheaper. Only one page of data entries is split. No
overflow page!
 Trick lies in how hash function is adjusted!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Example
 Directory is array of size 4.
 To find bucket for r, take last
`global depth’ # bits of h(r); we
denote r by h(r).
 If h(r) = 5 = binary 101, it is
in bucket pointed to by 01.
 Insert: If bucket is full, split it (allocate new page, re-distribute).
 If necessary, double the directory. (As we will see, splitting a
bucket does not always require doubling; we can tell by
comparing global depth with local depth for the split bucket.)
13*
00
01
10
11
2
2
2
2
2
LOCAL DEPTH
GLOBAL DEPTH
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
DATA PAGES
10*
1* 21*
4* 12* 32* 16*
15* 7* 19*
5*
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Insert h(r)=20 (Causes Doubling)
20*
00
01
10
11
2 2
2
2
LOCAL DEPTH 2
2
DIRECTORY
GLOBAL DEPTH
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
(`split image'
of Bucket A)
1* 5* 21*13*
32*16*
10*
15* 7* 19*
4* 12*
19*
2
2
2
000
001
010
011
100
101
110
111
3
3
3
DIRECTORY
Bucket A
Bucket B
Bucket C
Bucket D
Bucket A2
(`split image'
of Bucket A)
32*
1* 5* 21*13*
16*
10*
15* 7*
4* 20*
12*
LOCAL DEPTH
GLOBAL DEPTH
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Points to Note
 20 = binary 10100. Last 2 bits (00) tell us r belongs in
A or A2. Last 3 bits needed to tell which.
 Global depth of directory: Max # of bits needed to tell which
bucket an entry belongs to.
 Local depth of a bucket: # of bits used to determine if an
entry belongs to this bucket.
 When does bucket split cause directory doubling?
 Before insert, local depth of bucket = global depth. Insert
causes local depth to become > global depth; directory is
doubled by copying it over and `fixing’ pointer to split
image page. (Use of least significant bits enables efficient
doubling via copying of directory!)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Directory Doubling
00
01
10
11
2
Why use least significant bits in directory?
 Allows for doubling via copying!
000
001
010
011
3
100
101
110
111
vs.
0
1
1
6*
6*
6*
6 = 110
00
10
01
11
2
3
0
1
1
6*
6*
6*
6 = 110
000
100
010
110
001
101
011
111
Least Significant Most Significant
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Comments on Extendible Hashing
 If directory fits in memory, equality search
answered with one disk access; else two.
 100MB file, 100 bytes/rec, 4K pages contains 1,000,000
records (as data entries) and 25,000 directory elements;
chances are high that directory will fit in memory.
 Directory grows in spurts, and, if the distribution of hash
values is skewed, directory can grow large.
 Multiple entries with same hash value cause problems!
 Delete: If removal of data entry makes bucket
empty, can be merged with `split image’. If each
directory element points to same bucket as its split
image, can halve directory.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Linear Hashing
 This is another dynamic hashing scheme, an
alternative to Extendible Hashing.
 LH handles the problem of long overflow chains
without using a directory, and handles duplicates.
 Idea: Use a family of hash functions h0, h1, h2, ...
 hi(key) = h(key) mod(2i
N); N = initial # buckets
 h is some hash function (range is not 0 to N-1)
 If N = 2d0
, for some d0, hi consists of applying h and looking
at the last di bits, where di = d0 + i.
 hi+1 doubles the range of hi (similar to directory doubling)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Linear Hashing (Contd.)
 Directory avoided in LH by using overflow
pages, and choosing bucket to split round-robin.
 Splitting proceeds in `rounds’. Round ends when all
NR initial (for round R) buckets are split. Buckets 0 to
Next-1 have been split; Next to NR yet to be split.
 Current round number is Level.
 Search: To find bucket for data entry r, find hLevel(r):
•If hLevel(r) in range `Next to NR’, r belongs here.
•Else, r could belong to bucket hLevel(r) or bucket
hLevel(r) + NR; must apply hLevel+1(r) to find out.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Overview of LH File
 In the middle of a round.
Level
h
Buckets that existed at the
beginning of this round:
this is the range of
Next
Bucket to be split
of other buckets) in this round
Level
h search key value )
(
search key value )
(
Buckets split in this round:
If
is in this range, must use
h Level+1
`split image' bucket.
to decide if entry is in
created (through splitting
`split image' buckets:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Linear Hashing (Contd.)
 Insert: Find bucket by applying hLevel / hLevel+1:
 If bucket to insert into is full:
•Add overflow page and insert data entry.
•(Maybe) Split Next bucket and increment Next.
 Can choose any criterion to `trigger’ split.
 Since buckets are split round-robin, long overflow
chains don’t develop!
 Doubling of directory in Extendible Hashing is
similar; switching of hash functions is implicit in
how the # of bits examined is increased.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Example of Linear Hashing
 On split, hLevel+1 is used to
re-distribute entries.
0
h
h
1
(This info
is for illustration
only!)
Level=0, N=4
00
01
10
11
000
001
010
011
(The actual contents
of the linear hashed
file)
Next=0
PRIMARY
PAGES
Data entry r
with h(r)=5
Primary
bucket page
44* 36*
32*
25*
9* 5*
14* 18*10*30*
31*35* 11*
7*
0
h
h
1
Level=0
00
01
10
11
000
001
010
011
Next=1
PRIMARY
PAGES
44* 36*
32*
25*
9* 5*
14* 18*10*30*
31*35* 11*
7*
OVERFLOW
PAGES
43*
00
100
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Example: End of a Round
0
h
h1
22*
00
01
10
11
000
001
010
011
00
100
Next=3
01
10
101
110
Level=0
PRIMARY
PAGES
OVERFLOW
PAGES
32*
9*
5*
14*
25*
66* 10*
18* 34*
35*
31* 7* 11* 43*
44*36*
37*29*
30*
0
h
h1
37*
00
01
10
11
000
001
010
011
00
100
10
101
110
Next=0
Level=1
111
11
PRIMARY
PAGES
OVERFLOW
PAGES
11
32*
9* 25*
66* 18* 10* 34*
35* 11*
44* 36*
5* 29*
43*
14* 30* 22*
31*7*
50*
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
LH Described as a Variant of EH
 The two schemes are actually quite similar:
 Begin with an EH index where directory has N elements.
 Use overflow pages, split buckets round-robin.
 First split is at bucket 0. (Imagine directory being doubled at
this point.) But elements <1,N+1>, <2,N+2>, ... are the same.
So, need only create directory element N, which differs from
0, now.
• When bucket 1 splits, create directory element N+1, etc.
 So, directory can double gradually. Also, primary
bucket pages are created in order. If they are allocated
in sequence too (so that finding i’th is easy), we
actually don’t need a directory! Voila, LH.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Summary
 Hash-based indexes: best for equality searches,
cannot support range searches.
 Static Hashing can lead to long overflow chains.
 Extendible Hashing avoids overflow pages by
splitting a full bucket when a new data entry is to be
added to it. (Duplicates may require overflow pages.)
 Directory to keep track of buckets, doubles periodically.
 Can get large with skewed data; additional I/O if this
does not fit in main memory.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Summary (Contd.)
 Linear Hashing avoids directory by splitting buckets
round-robin, and using overflow pages.
 Overflow pages not likely to be long.
 Space utilization could be lower than Extendible Hashing,
since splits not concentrated on `dense’ data areas.
•Can tune criterion for triggering splits to trade-off
slightly longer chains for better space utilization.
 For hash-based indexes, a skewed data distribution is
one in which the hash values of data entries are not
uniformly distributed!

More Related Content

PPT
INDEXING AND HASHING UNIT 4 SILBERSCHATZ
PPTX
introduction to trees,graphs,hashing
PPTX
hashing in data structures and its applications
PPTX
Unit4 Part3.pptx
PDF
Hashing and File Structures in Data Structure.pdf
PPT
Extensible hashing
PDF
extensiblehashing-191010111114.pdf
PPTX
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx
INDEXING AND HASHING UNIT 4 SILBERSCHATZ
introduction to trees,graphs,hashing
hashing in data structures and its applications
Unit4 Part3.pptx
Hashing and File Structures in Data Structure.pdf
Extensible hashing
extensiblehashing-191010111114.pdf
Lecture14-Hash-Based-Indexing-and-Sorting-MHH-18Oct-2016.pptx

Similar to Database MGMT - Hash Index Linear Hashing only (20)

PPT
Hashing
PPT
Chapter 12 ds
PPTX
unit 3 Divide and Conquer Rule and Sorting.pptx
PPTX
Hash Table.pptx
PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
PPT
Hashing PPT
PPTX
Hive query optimization infinity
PPT
PAM.ppt
PPTX
Hashing in data structure is presented in these slides
PPTX
hashing1.pptx Data Structures and Algorithms
PPTX
Working with python Nice PPT must try very good
PDF
Modern Database Systems - Lecture 02
PPTX
Hashing .pptx
PPTX
Big Data Analytics Module-4 as per vtu .pptx
PPTX
Hashing Techniques in database management systems
PDF
Algorithms notes tutorials duniya
PDF
hash tables for data structures and algorithm
PPTX
Fundamental of Big Data with Hadoop and Hive
PPTX
Presentation.pptx
PDF
Unit-2 Hadoop Framework.pdf
Hashing
Chapter 12 ds
unit 3 Divide and Conquer Rule and Sorting.pptx
Hash Table.pptx
DataBaseManagementSystems-BTECH--UNIT-5.pdf
Hashing PPT
Hive query optimization infinity
PAM.ppt
Hashing in data structure is presented in these slides
hashing1.pptx Data Structures and Algorithms
Working with python Nice PPT must try very good
Modern Database Systems - Lecture 02
Hashing .pptx
Big Data Analytics Module-4 as per vtu .pptx
Hashing Techniques in database management systems
Algorithms notes tutorials duniya
hash tables for data structures and algorithm
Fundamental of Big Data with Hadoop and Hive
Presentation.pptx
Unit-2 Hadoop Framework.pdf
Ad

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Computer network topology notes for revision
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Foundation of Data Science unit number two notes
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Mega Projects Data Mega Projects Data
PDF
Business Analytics and business intelligence.pdf
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Lecture1 pattern recognition............
Acceptance and paychological effects of mandatory extra coach I classes.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Miokarditis (Inflamasi pada Otot Jantung)
Computer network topology notes for revision
Supervised vs unsupervised machine learning algorithms
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
oil_refinery_comprehensive_20250804084928 (1).pptx
Reliability_Chapter_ presentation 1221.5784
Foundation of Data Science unit number two notes
Qualitative Qantitative and Mixed Methods.pptx
Business Acumen Training GuidePresentation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Mega Projects Data Mega Projects Data
Business Analytics and business intelligence.pdf
IBA_Chapter_11_Slides_Final_Accessible.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Ad

Database MGMT - Hash Index Linear Hashing only

  • 1. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Hash-Based Indexes Chapter 11
  • 2. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Introduction  Hash-based indexes are best for equality selections. Cannot support range searches.
  • 3. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Static Hashing  # primary pages fixed, allocated sequentially, never de-allocated; overflow pages if needed.  h(k) mod M = bucket to which data entry with key k belongs. (M = # of buckets) h(key) mod M h key Primary bucket pages Overflow pages 2 0 M-1
  • 4. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Static Hashing (Contd.)  Buckets contain data entries.  Hash fn works on search key field of record r. Must distribute values over range 0 ... M-1.  h(key) = (a * key + b) usually works well.  a and b are constants; lots known about how to tune h.  Long overflow chains can develop and degrade performance.  Extendible and Linear Hashing: Dynamic techniques to fix this problem.
  • 5. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Extendible Hashing  Situation: Bucket (primary page) becomes full. Why not re-organize file by doubling # of buckets?  Reading and writing all pages is expensive!  Idea: Use directory of pointers to buckets, double # of buckets by doubling the directory, splitting just the bucket that overflowed!  Directory much smaller than file, so doubling it is much cheaper. Only one page of data entries is split. No overflow page!  Trick lies in how hash function is adjusted!
  • 6. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Example  Directory is array of size 4.  To find bucket for r, take last `global depth’ # bits of h(r); we denote r by h(r).  If h(r) = 5 = binary 101, it is in bucket pointed to by 01.  Insert: If bucket is full, split it (allocate new page, re-distribute).  If necessary, double the directory. (As we will see, splitting a bucket does not always require doubling; we can tell by comparing global depth with local depth for the split bucket.) 13* 00 01 10 11 2 2 2 2 2 LOCAL DEPTH GLOBAL DEPTH DIRECTORY Bucket A Bucket B Bucket C Bucket D DATA PAGES 10* 1* 21* 4* 12* 32* 16* 15* 7* 19* 5*
  • 7. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Insert h(r)=20 (Causes Doubling) 20* 00 01 10 11 2 2 2 2 LOCAL DEPTH 2 2 DIRECTORY GLOBAL DEPTH Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image' of Bucket A) 1* 5* 21*13* 32*16* 10* 15* 7* 19* 4* 12* 19* 2 2 2 000 001 010 011 100 101 110 111 3 3 3 DIRECTORY Bucket A Bucket B Bucket C Bucket D Bucket A2 (`split image' of Bucket A) 32* 1* 5* 21*13* 16* 10* 15* 7* 4* 20* 12* LOCAL DEPTH GLOBAL DEPTH
  • 8. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Points to Note  20 = binary 10100. Last 2 bits (00) tell us r belongs in A or A2. Last 3 bits needed to tell which.  Global depth of directory: Max # of bits needed to tell which bucket an entry belongs to.  Local depth of a bucket: # of bits used to determine if an entry belongs to this bucket.  When does bucket split cause directory doubling?  Before insert, local depth of bucket = global depth. Insert causes local depth to become > global depth; directory is doubled by copying it over and `fixing’ pointer to split image page. (Use of least significant bits enables efficient doubling via copying of directory!)
  • 9. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Directory Doubling 00 01 10 11 2 Why use least significant bits in directory?  Allows for doubling via copying! 000 001 010 011 3 100 101 110 111 vs. 0 1 1 6* 6* 6* 6 = 110 00 10 01 11 2 3 0 1 1 6* 6* 6* 6 = 110 000 100 010 110 001 101 011 111 Least Significant Most Significant
  • 10. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Comments on Extendible Hashing  If directory fits in memory, equality search answered with one disk access; else two.  100MB file, 100 bytes/rec, 4K pages contains 1,000,000 records (as data entries) and 25,000 directory elements; chances are high that directory will fit in memory.  Directory grows in spurts, and, if the distribution of hash values is skewed, directory can grow large.  Multiple entries with same hash value cause problems!  Delete: If removal of data entry makes bucket empty, can be merged with `split image’. If each directory element points to same bucket as its split image, can halve directory.
  • 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Linear Hashing  This is another dynamic hashing scheme, an alternative to Extendible Hashing.  LH handles the problem of long overflow chains without using a directory, and handles duplicates.  Idea: Use a family of hash functions h0, h1, h2, ...  hi(key) = h(key) mod(2i N); N = initial # buckets  h is some hash function (range is not 0 to N-1)  If N = 2d0 , for some d0, hi consists of applying h and looking at the last di bits, where di = d0 + i.  hi+1 doubles the range of hi (similar to directory doubling)
  • 12. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Linear Hashing (Contd.)  Directory avoided in LH by using overflow pages, and choosing bucket to split round-robin.  Splitting proceeds in `rounds’. Round ends when all NR initial (for round R) buckets are split. Buckets 0 to Next-1 have been split; Next to NR yet to be split.  Current round number is Level.  Search: To find bucket for data entry r, find hLevel(r): •If hLevel(r) in range `Next to NR’, r belongs here. •Else, r could belong to bucket hLevel(r) or bucket hLevel(r) + NR; must apply hLevel+1(r) to find out.
  • 13. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Overview of LH File  In the middle of a round. Level h Buckets that existed at the beginning of this round: this is the range of Next Bucket to be split of other buckets) in this round Level h search key value ) ( search key value ) ( Buckets split in this round: If is in this range, must use h Level+1 `split image' bucket. to decide if entry is in created (through splitting `split image' buckets:
  • 14. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Linear Hashing (Contd.)  Insert: Find bucket by applying hLevel / hLevel+1:  If bucket to insert into is full: •Add overflow page and insert data entry. •(Maybe) Split Next bucket and increment Next.  Can choose any criterion to `trigger’ split.  Since buckets are split round-robin, long overflow chains don’t develop!  Doubling of directory in Extendible Hashing is similar; switching of hash functions is implicit in how the # of bits examined is increased.
  • 15. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Example of Linear Hashing  On split, hLevel+1 is used to re-distribute entries. 0 h h 1 (This info is for illustration only!) Level=0, N=4 00 01 10 11 000 001 010 011 (The actual contents of the linear hashed file) Next=0 PRIMARY PAGES Data entry r with h(r)=5 Primary bucket page 44* 36* 32* 25* 9* 5* 14* 18*10*30* 31*35* 11* 7* 0 h h 1 Level=0 00 01 10 11 000 001 010 011 Next=1 PRIMARY PAGES 44* 36* 32* 25* 9* 5* 14* 18*10*30* 31*35* 11* 7* OVERFLOW PAGES 43* 00 100
  • 16. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Example: End of a Round 0 h h1 22* 00 01 10 11 000 001 010 011 00 100 Next=3 01 10 101 110 Level=0 PRIMARY PAGES OVERFLOW PAGES 32* 9* 5* 14* 25* 66* 10* 18* 34* 35* 31* 7* 11* 43* 44*36* 37*29* 30* 0 h h1 37* 00 01 10 11 000 001 010 011 00 100 10 101 110 Next=0 Level=1 111 11 PRIMARY PAGES OVERFLOW PAGES 11 32* 9* 25* 66* 18* 10* 34* 35* 11* 44* 36* 5* 29* 43* 14* 30* 22* 31*7* 50*
  • 17. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 LH Described as a Variant of EH  The two schemes are actually quite similar:  Begin with an EH index where directory has N elements.  Use overflow pages, split buckets round-robin.  First split is at bucket 0. (Imagine directory being doubled at this point.) But elements <1,N+1>, <2,N+2>, ... are the same. So, need only create directory element N, which differs from 0, now. • When bucket 1 splits, create directory element N+1, etc.  So, directory can double gradually. Also, primary bucket pages are created in order. If they are allocated in sequence too (so that finding i’th is easy), we actually don’t need a directory! Voila, LH.
  • 18. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Summary  Hash-based indexes: best for equality searches, cannot support range searches.  Static Hashing can lead to long overflow chains.  Extendible Hashing avoids overflow pages by splitting a full bucket when a new data entry is to be added to it. (Duplicates may require overflow pages.)  Directory to keep track of buckets, doubles periodically.  Can get large with skewed data; additional I/O if this does not fit in main memory.
  • 19. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Summary (Contd.)  Linear Hashing avoids directory by splitting buckets round-robin, and using overflow pages.  Overflow pages not likely to be long.  Space utilization could be lower than Extendible Hashing, since splits not concentrated on `dense’ data areas. •Can tune criterion for triggering splits to trade-off slightly longer chains for better space utilization.  For hash-based indexes, a skewed data distribution is one in which the hash values of data entries are not uniformly distributed!

Editor's Notes

  • #1: The slides for this text are organized into chapters. This lecture covers Chapter 11, and discusses hash-based indexing in depth. It should be covered after Chapter 8, which provides an overview of storage and indexing. At the instructor’s discretion, it can also be omitted without loss of continuity in other parts of the text. (In particular, Chapter 20 can be covered without covering this chapter, though covering this chapter will certainly provide a stronger foundation.)