SlideShare a Scribd company logo

Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6
9:00 Intro &
terminology
TP mons
& ORBs
Logging &
res. Mgr.
Files &
Buffer Mgr.
Structured
files
11:00 Reliability Locking
theory
Res. Mgr. &
Trans. Mgr.
COM+ Access paths
13:30 Fault
tolerance
Locking
techniques
CICS & TP
& Internet
CORBA/
EJB + TP
Groupware
15:30 Transaction
models
Queueing Advanced
Trans. Mgr.
Replication Performance
& TPC
18:00 Reception Workflow Cyberbricks Party FREE
Access Paths
Chapter 20

Types of Associative Access Paths
Primary index: Given the primary key value,
find the tuple.
Secondary index: Given the value of a non-
unique attribute, find all qualified tuples.
Range index: Given the value range of some
attribute, find all tuples within that range.
Structure index: Given some tuple, find all
structurally related tuples (CODASYL sets,
object hierarchies, etc.)

Two Important Techniques
Two basic techniques dominate in modern DBMSs:
Hashing: Use a fixed transformation algorithm to
convert the attribute value into a database address.
Tree search:A dynamical search structure is built that
guides the search for a given attribute value to the
proper database location.
Hashing supports primary indices only. Tree search is more
versatile.

The Access Gap
Accessing a tuple in buffer costs ca. 2000 instr.
Accessing it on disk takes 25 ms for I/O-related
activities. On a 20 MIPS machine, this translates
into 500,000 instructions.
Therefore, one can spend many instructions on
an algorithm that saves one I/O on the average.
For access paths, the dominant cost measure is
the number of different pages that need to be
accessed during the search.

rangeof
potential
keyvalues
(theshaded
areasdenote
usedkey
values)
rangeofpositiveintegers
tupleaddress
space
Tuplesshoulduseupthe
availabeaddressspaceasevenly
aspossible.Thewhiteportions
indicatethatspaceutilization
willbe<100%.
themagicoffolding themagicofhashing
Hashing

Folding vs. Hashing
Folding is used to turn an attribute value of
arbitrary length and arbitrary data type in to an
unsigned integer the maximum length of which
is determined by the instruction set.
Hashing is used to transform the result of
folding into the address of a page that (probably)
holds the tuple with the specified key value.

Requirement
A good hash function H has to map primary key
values, which are very unevenly distributed over
a large value range, into a tuple address space
that is much smaller (proportional to the number
of existing tuples), such that the resulting
addresses are evenly distributed over the
address range.

Parameters of the Hash Function
K: This is the (folded) key value. It varies
between 0 and 2**32 - 1.
B: Number of pages to be allocated for the file.
Depends on the number of tuples expected.
H: Hash function that performs a mapping of:
(0, 2**32-1) -> (0, B-1).

Consequences of the Approach
Contiguous allocation: All B pages must
allocated in physical contiguity, because the
relative addresses vary between 0 and B-1.
Fixed size: The file size must be determined
upon creation time, because changing the size
(i.e. changing B) means changing the hash
function. This in turn requires a complete
reorganization.

Requirements for a Hash Function
Hash-based allocation assumes that it is possible to estimate the number of
tuples T that the relation will have, and that this estimate is not drastically
exceeded.
If a block has length B, and a tuple has an average length of L bytes, then
we need at least S =  T / (B/L)  blocks to store the T tuples.
The required number of blocks (S) is allocated before the first tuple is
stored. It is a good idea to allocate some more blocks (S’ > S) to allow for
unexpected growth.
Then a hash function H is defined, which takes in the value of the primary
key k of the relation and converts it into a number between 1 and S’; this is
the block number where the tuple is to be stored. If K is the set of possible
values for the primary key we have:
H: K → {1, 2, …, S’}
The set of potential values for the primary key attribute will be much larger
than the number of blocks allocated (think, e.g., of ISBNs for the books
relation). So the hash function is a compacting function. For each primary
key value, there is exactly one block it is mapped to. Many different primary
key values are mapped to the same block (n:1 relationship).

Properties of Hash Functions
A hash function must be easy to compute and must not require access to
any blocks in the database.
It must be able to map a generally very large set of potential primary key
values (remember: primary keys can be constructed by concatenating
several attributes of a relation) into a comparatively small set of block
numbers in which the tuples will be stored.
It must be able to take in primary key values of different data types (integer,
binary, decimal, character, etc.) and map them to the set of integers
between 1 and S’ with equal efficiency.
The formula for estimating S based on the number of tuples, average tuple
length and block length implicitly assumes that all blocks are equally filled,
i.e. that the same number of tuples is mapped to each block. This is the
most difficult requirement, because the primary key values in general are
not equally distributed over their value range: Some parts of the value
range are used, others are not used at all, keys are generated by some
regular mechanism, etc.
To achieve this “hashing” property, different methods exist: table look-up,
base conversion, folding, encryption, division by prime numbers, etc.

A Popular Hash Function
A hash function that most database systems use as a default (if the user
does not specify one) is defined as:
H(k) := k mod d + 1
This requires that k is a positive integer. If the primary key attribute of the
relation does not have the data type integer - it could, for example, be a
name - then it has to be turned into an integer first. The usual way of doing
this is to “fold” the binary representation of the key value such that its length
does not exceed 32 bits. Then these 32 bits are interpreted as an integer; if
it is negative, it is multiplied by -1 one. Details of folding are omitted here.
For H to be a good hash function, d must be a large enough prime number;
this is explained by detailed number theoretic analyses. We also must make
sure that the number of blocks allocated is about 25% larger than the
minimum requirement.
Summing it up: We first compute S. Then we compute S’ = 1.25*S. Then we
compute d = next_higher_prime (S’). H(k) will then determine the block for
each tuple based on the primary key value.

maximum number of tuples per page (bucket)
1 2 3 5 10 20 40
.50
.60
.70
.80
.85
.90
.92
.95
0.500 0.177 0.087 0.031 0.005 0.000 0.000
0.750 0.293 0.158 0.066 0.015 0.002 0.000
1.167 0.494 0.286 0.136 0.042 0.010 0.001
2.000 0.903 0.554 0.289 0.110 0.036 0.009
2.832 1.316 0.827 0.449 0.185 0.069 0.022
4.495 2.146 1.377 0.777 0.345 0.144 0.055
5.740 1.792 1.024 0.467 0.203 0.0832.768
9.444 4.631 3.035 1.769 0.837 0.386 0.171
U
Average Number of Overflow Pages

Hashing for Non-Unique Attributes
Let V denote the number of different attribute
values. Then we can distinguish three cases:
V ~ T: The attribute is “almost” unique; a good
hash function should work in that case.
V > B: There are more values than buckets. Can
be made work, but some buckets may get much
higher utilization than others.
V < B: This is the case where hashing cannot be
used.

buckets
overflow
pages buckets
(Implicit) pointers
establishedbythe
overflowfunction.In
this example the
overflowbucket for
bucket bis b+3mod
B.
(Completelyfilledbuckets andpages are shaded.)
a) external overflowhandling b) internal overflowhandling
Overflow Handling

Hashing: Summary
For unique attributes, hashing can yield one-
access retrieval.
It is critical to find a good hash function to
reduce collisions.
If the original estimate of the file size is wrong,
reorganization is inevitable.
Synchronization at the page level is done using
standard crabbing techniques.
Hashing does not support range queries.

B-Trees
B-Trees consist of two types of nodes:
Leaf nodes: The contain the data, i.e. the tuples
or pointers to the tuples (TIDs).
Index nodes: Index nodes contain reference keys
to direct the search towards the leaves. The data
structure looks like this:
struct { char * KeyValue;
PAGEID PointerToNextNode;
} index_node_structure[ ];

Rules for Index Nodes
Key values are in sorted order: K0 ≤ K1 ≤ ... ≤ Ki
≤ ... ≤ Kf (f is max. capacity of a node).
For any two adjacent key values Ki, Ki+1 the
pointer Pi points to a node covering all values in
the interval (Ki, Ki+1].
If a search for value v arrives at an index node,
the next node to be visited is pointed to by Pi
such that Ki ≤ v < Ki+1.
K0 is an arbitrary “low value” (smaller than
anything else in that node).

Properties of a B-Tree
Parameter f is called the fan-out of the tree.
The number of nodes visited from the root to a
leaf is called the height of the tree.
A B-tree is always perfectly balanced, i.e. the
height is the same for all leaves.
Storage utilization is at least 50% for all nodes
except for the root.
Average storage utilization is close to 70%.

120 136
140 151
220 255 256
271 296 299
312 318
140
168 170 190
271 312
220
168
root node
303
leaf node L1
leaf node L2
leaf node L3
index node I
The shaded parts at the beginning of the index
nodes represent the low value key (K0).
A Simple B-Tree

Some Observations
B-trees also work for non-unique attributes;
implementational optimizations will be discussed
later on.
The reference keys in the index nodes can be
different from all “real” key values in the leaves;
they only have to guide the search correctly.
The key values at the leaf level are sorted in
ascending order; this supports range queries.

255 256
271 296 299
312 318
271 312
303
leaf node L1
leaf node L2
leaf node L3
index node I
key value 280
271 312
index node I
L1
299
L3
271 296
leaf node L2
299 303
leaf node Ln
key value 280
before split of index node I after split of index node I
insert insert
Inserting Into a B-Tree

Growing a B-Tree
If the insert leaf is full, allocate a new node,
distribute the values (the new one sorted in
place) evenly across the old leaf and the new
node, move the lowest key value of the new
node up to the index node.
If that index node is full, split it in the same way.
If the root has to be split: Allocate two new
nodes, distribute the key values evenly over
them, put the reference key in the root.

Deleting Tuples From a B-Tree
To maintain the space utilization guarantees, a
leaf that becomes under-utilized (< 50%) would
have to be merged with its neighbours.
This is a very costly operation; in particular,
synchronization at the page level is very
complicated.
Therefore, most systems let nodes become
empty and discard them when that happens.
Analyses show that this does not deteriorate the
overall B-tree performance.

23 P1 P2 P3 28 P4 29 P5
keyvalue pointer list
3 1 2 P6
23 R1 R2 R328 29
keyvalue
reference
P1 P2 P3
P4
P5 P6
23 P1 23 P2 23 P3 28 P4 29 P5
keyvalue
pointer
29 P6
Non-Unique Attributes

F
The Basic Formula of B-Tree-Performance
With the N: number of tuples, C: average
number of entries in a leaf, F: average number
of entries in an index node, the height H of a B-
tree is
H = 1 + log ( N/C)
Key-sequenced file (C* = 43) Secondary index (C* = 300)
H
N(max) increase N(max) increase
2
3
4
5
12,900
3,870,000
1,161,000,000
348,300,000,000
90,000
27,000,000
8,100,000,000
2,430,000,000,000
- -
3,857,100
1,157,130,000
347,139,000,000
26,910,000
8,073,000,000
2,421,900,000,000
Some Performance Figures

Tuples in the Leaves?
Assuming a tuple is x times longer than a TID, we
get the following estimate:
1 + logF (N/(x•C)) + 1.1 £ 1 + logF (N/C).
This transforms into
1.1 £ logF x
When this holds, moving the tuples out of the
leaves improves performance.

Bertolucci Copelletti Gambogi. . .
Cooperativa
Bert Cop Gam. . .
Cooperativa. . . Copelletti. . . Copelletti. . .
a) Fullkey values stored in all
nodes of the B*-tree
b) Suffix compression for key
values in high-levelnodes
Key Compression: Suffix Compression

Synchronization on B-Trees: What Is the
Problem?
B-Trees are fully redundant structures, which
can be reconstructed from the tuples; therefore,
no synchronization should be required at all.
However, some queries operate on the index
only. This requires all operations on B-trees to
be serializable with the operations on the tuples.
Standard two-phase locking with the nodes as
the objects is not feasible for performance
reasons.

Node Q
at level i
Node R
at level i+1
search
path
1. semaphore on Q
2. follow search path
3. semaphore on R
4. release sem. on Q
This is an example
of the crabbing
technique
Protecting Tree Traversal

B*-tree
14 P1 P2 16 P3 17 P4 P5 P6 20 P7 22 P8 P9 P10
B-Trees and Value Locks

Making Lock Names
To implement value locking, we need to build
lock names according to the following rule:
LockN := {TableName, IndexName, KeyValue}.
KeyValue in turn is a composite:
KeyValue := {AttributeValue, TupleIdentifier}.

Key Range Locking an B-Trees
Simple retrieval: k = c
Get a semaphore on leaf page; get S-lock on key
range defined by largest existing value c1 with
c1 ≤ c; hold lock until commit.

Key Range Locking an B-Trees
Range retrieval: c1 < k < c2
Get s semaphore on first leaf page; get S-lock on
key defined by largest existing value c3 with
c3 < c1; proceed sequentially along leaf level;
request key range S-lock for each new attribute
value up to and including c2; do careful crabbing
across leaf pages; hold S-lock until commit.

Key Range Locking an B-Trees
Insert: [c, ki]
Get X-semaphore on leaf page; find largest
existing value c1 with
c1 < c; request instant IX-lock on c1 ; request
long X-lock on c.

Delete: [c, kd]
Get X-semaphore on leaf page; find largest
existing value c1 with
c1 < c; request long IX-lock on c; else request
long X-lock on c and c1.
Key Range Locking an B-Trees

Pa
Transaction T1 inserts into
page pa, causing a split
T1
Pa
Pa'
Transaction T2 inserts into
page pa'
Pi Pi
B-Tree Recovery

B-Tree Recovery Based on Physiological
Logging
Cover all B-tree operations with semaphores on
all affected pages.
For each logical update a log record with the
logical UNDO operation must be moved to the
log
While the update operation is being performed,
physical REDO log records are written.
After all REDO records are safely in the log, the
exclusive semaphores can be released.

The Two Phases of B-Tree-Recovery
Phase1: Go forward through the log up to its
current end, applying all REDO records to the
tree.
Phase2: Go backward to the Begin of transaction
record of the oldest incomplete transaction,
executing the UNDO operations on the tree for
all losers along the way.

key value ¥
the magic
of hashing
pointer
table
(directory)
buckets
(pages)
Other Access Path Methods:
Extendible Hashing

New Techniques
Grid files: Symmetric multi-dimensional point
access. Can become very unbalanced
depending on correlation in the data.
R-Trees: Symmetric multi-dimensional access.
Can deteriorate depending on insertion strategy.
hb-Tree: Symmetric multi-dimensional access.
Can turn into a DAG depending on deletion
order.

More Related Content

PPT
Extensible hashing
ZIP
Hashing
PDF
Project - Deep Locality Sensitive Hashing
PPT
4.4 external hashing
PDF
Hybrid geo textual index structure
PPT
Hashing
PDF
PPT
12. Indexing and Hashing in DBMS
Extensible hashing
Hashing
Project - Deep Locality Sensitive Hashing
4.4 external hashing
Hybrid geo textual index structure
Hashing
12. Indexing and Hashing in DBMS

What's hot (18)

PDF
DBMS 9 | Extendible Hashing
PPT
4.4 hashing
PPTX
Hashing in datastructure
PPT
Hashing gt1
PDF
Cwkaa 2010
PPT
PDF
R training2
PPTX
Predicting the relevance of search results for e-commerce systems
PDF
Pattern matching programs
PDF
Chapter 4 algorithmic efficiency handouts (with notes)
PPTX
Regular Expressions in JavaScript and Command Line
PDF
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
PPT
Cosequential processing and the sorting of large files
PPTX
Dynamic Memory & Linked Lists
PPT
Fundamentals of data structures
PDF
RDataMining slides-text-mining-with-r
PDF
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
DBMS 9 | Extendible Hashing
4.4 hashing
Hashing in datastructure
Hashing gt1
Cwkaa 2010
R training2
Predicting the relevance of search results for e-commerce systems
Pattern matching programs
Chapter 4 algorithmic efficiency handouts (with notes)
Regular Expressions in JavaScript and Command Line
Real-time Data De-duplication using Locality-sensitive Hashing powered by Sto...
Cosequential processing and the sorting of large files
Dynamic Memory & Linked Lists
Fundamentals of data structures
RDataMining slides-text-mining-with-r
Data Science Keys to Open Up OpenNASA Datasets - PyData New York 2017
Ad

Viewers also liked (11)

PDF
3 concurrency controlone_v3
PDF
17 wics99 harkey
PPT
10b rm
PPT
16 greg hope_com_wics
PDF
Project description2012
PDF
Solution7.2012
PDF
9 queuing
PDF
2 shadowing
PPT
08 message and_queues_dieter_gawlick
PDF
1 introduction
PPT
14 scaleabilty wics
3 concurrency controlone_v3
17 wics99 harkey
10b rm
16 greg hope_com_wics
Project description2012
Solution7.2012
9 queuing
2 shadowing
08 message and_queues_dieter_gawlick
1 introduction
14 scaleabilty wics
Ad

Similar to 20 access paths (20)

PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
PPTX
Hash Table.pptx
PPTX
hashing in data structures and its applications
PPTX
Hashing techniques, Hashing function,Collision detection techniques
PPTX
Hashing Techniques in database management systems
PPTX
B tree ,B plus and graph
PPTX
Hash table
PPTX
DBMS-Unit5-PPT.pptx important for revision
PPTX
Data Structures-Topic-Hashing, Collision
PDF
Hashing and File Structures in Data Structure.pdf
PPTX
B tree
PPTX
hashing in data structure for Btech.pptx
PPTX
hashing in data structure for engineering.pptx
PPTX
hashing in data structure for Btech .pptx
PPTX
Hashing
PPT
Design data Analysis hashing.ppt by piyush
PPTX
Hashing and Collision Advanced data structure and algorithm
PDF
Hash Functions FTW
PPTX
Hashing .pptx
PPT
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
DataBaseManagementSystems-BTECH--UNIT-5.pdf
Hash Table.pptx
hashing in data structures and its applications
Hashing techniques, Hashing function,Collision detection techniques
Hashing Techniques in database management systems
B tree ,B plus and graph
Hash table
DBMS-Unit5-PPT.pptx important for revision
Data Structures-Topic-Hashing, Collision
Hashing and File Structures in Data Structure.pdf
B tree
hashing in data structure for Btech.pptx
hashing in data structure for engineering.pptx
hashing in data structure for Btech .pptx
Hashing
Design data Analysis hashing.ppt by piyush
Hashing and Collision Advanced data structure and algorithm
Hash Functions FTW
Hashing .pptx
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv

More from ashish61_scs (20)

PDF
7 concurrency controltwo
PDF
Transactions
PPT
22 levine
PDF
21 domino mohan-1
PPT
19 structured files
PPT
18 philbe replication stanford99
PPT
15 bufferand records
PPT
14 turing wics
PPT
13 tm adv
PPT
PPT
10a log
PPT
09 workflow
PPT
06 07 lock
PPT
05 tp mon_orbs
PPT
04 transaction models
PPT
03 fault model
PPT
02 fault tolerance
PPT
01 whirlwind tour
PDF
Solution5.2012
PDF
Solution6.2012
7 concurrency controltwo
Transactions
22 levine
21 domino mohan-1
19 structured files
18 philbe replication stanford99
15 bufferand records
14 turing wics
13 tm adv
10a log
09 workflow
06 07 lock
05 tp mon_orbs
04 transaction models
03 fault model
02 fault tolerance
01 whirlwind tour
Solution5.2012
Solution6.2012

Recently uploaded (20)

PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Business Ethics Teaching Materials for college
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Pharma ospi slides which help in ospi learning
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Business Ethics Teaching Materials for college
Week 4 Term 3 Study Techniques revisited.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Renaissance Architecture: A Journey from Faith to Humanism
Pharma ospi slides which help in ospi learning
VCE English Exam - Section C Student Revision Booklet
STATICS OF THE RIGID BODIES Hibbelers.pdf

20 access paths

  • 1.  Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro & terminology TP mons & ORBs Logging & res. Mgr. Files & Buffer Mgr. Structured files 11:00 Reliability Locking theory Res. Mgr. & Trans. Mgr. COM+ Access paths 13:30 Fault tolerance Locking techniques CICS & TP & Internet CORBA/ EJB + TP Groupware 15:30 Transaction models Queueing Advanced Trans. Mgr. Replication Performance & TPC 18:00 Reception Workflow Cyberbricks Party FREE Access Paths Chapter 20
  • 2.  Types of Associative Access Paths Primary index: Given the primary key value, find the tuple. Secondary index: Given the value of a non- unique attribute, find all qualified tuples. Range index: Given the value range of some attribute, find all tuples within that range. Structure index: Given some tuple, find all structurally related tuples (CODASYL sets, object hierarchies, etc.)
  • 3.  Two Important Techniques Two basic techniques dominate in modern DBMSs: Hashing: Use a fixed transformation algorithm to convert the attribute value into a database address. Tree search:A dynamical search structure is built that guides the search for a given attribute value to the proper database location. Hashing supports primary indices only. Tree search is more versatile.
  • 4.  The Access Gap Accessing a tuple in buffer costs ca. 2000 instr. Accessing it on disk takes 25 ms for I/O-related activities. On a 20 MIPS machine, this translates into 500,000 instructions. Therefore, one can spend many instructions on an algorithm that saves one I/O on the average. For access paths, the dominant cost measure is the number of different pages that need to be accessed during the search.
  • 6.  Folding vs. Hashing Folding is used to turn an attribute value of arbitrary length and arbitrary data type in to an unsigned integer the maximum length of which is determined by the instruction set. Hashing is used to transform the result of folding into the address of a page that (probably) holds the tuple with the specified key value.
  • 7.  Requirement A good hash function H has to map primary key values, which are very unevenly distributed over a large value range, into a tuple address space that is much smaller (proportional to the number of existing tuples), such that the resulting addresses are evenly distributed over the address range.
  • 8.  Parameters of the Hash Function K: This is the (folded) key value. It varies between 0 and 2**32 - 1. B: Number of pages to be allocated for the file. Depends on the number of tuples expected. H: Hash function that performs a mapping of: (0, 2**32-1) -> (0, B-1).
  • 9.  Consequences of the Approach Contiguous allocation: All B pages must allocated in physical contiguity, because the relative addresses vary between 0 and B-1. Fixed size: The file size must be determined upon creation time, because changing the size (i.e. changing B) means changing the hash function. This in turn requires a complete reorganization.
  • 10.  Requirements for a Hash Function Hash-based allocation assumes that it is possible to estimate the number of tuples T that the relation will have, and that this estimate is not drastically exceeded. If a block has length B, and a tuple has an average length of L bytes, then we need at least S =  T / (B/L)  blocks to store the T tuples. The required number of blocks (S) is allocated before the first tuple is stored. It is a good idea to allocate some more blocks (S’ > S) to allow for unexpected growth. Then a hash function H is defined, which takes in the value of the primary key k of the relation and converts it into a number between 1 and S’; this is the block number where the tuple is to be stored. If K is the set of possible values for the primary key we have: H: K → {1, 2, …, S’} The set of potential values for the primary key attribute will be much larger than the number of blocks allocated (think, e.g., of ISBNs for the books relation). So the hash function is a compacting function. For each primary key value, there is exactly one block it is mapped to. Many different primary key values are mapped to the same block (n:1 relationship).
  • 11.  Properties of Hash Functions A hash function must be easy to compute and must not require access to any blocks in the database. It must be able to map a generally very large set of potential primary key values (remember: primary keys can be constructed by concatenating several attributes of a relation) into a comparatively small set of block numbers in which the tuples will be stored. It must be able to take in primary key values of different data types (integer, binary, decimal, character, etc.) and map them to the set of integers between 1 and S’ with equal efficiency. The formula for estimating S based on the number of tuples, average tuple length and block length implicitly assumes that all blocks are equally filled, i.e. that the same number of tuples is mapped to each block. This is the most difficult requirement, because the primary key values in general are not equally distributed over their value range: Some parts of the value range are used, others are not used at all, keys are generated by some regular mechanism, etc. To achieve this “hashing” property, different methods exist: table look-up, base conversion, folding, encryption, division by prime numbers, etc.
  • 12.  A Popular Hash Function A hash function that most database systems use as a default (if the user does not specify one) is defined as: H(k) := k mod d + 1 This requires that k is a positive integer. If the primary key attribute of the relation does not have the data type integer - it could, for example, be a name - then it has to be turned into an integer first. The usual way of doing this is to “fold” the binary representation of the key value such that its length does not exceed 32 bits. Then these 32 bits are interpreted as an integer; if it is negative, it is multiplied by -1 one. Details of folding are omitted here. For H to be a good hash function, d must be a large enough prime number; this is explained by detailed number theoretic analyses. We also must make sure that the number of blocks allocated is about 25% larger than the minimum requirement. Summing it up: We first compute S. Then we compute S’ = 1.25*S. Then we compute d = next_higher_prime (S’). H(k) will then determine the block for each tuple based on the primary key value.
  • 13.  maximum number of tuples per page (bucket) 1 2 3 5 10 20 40 .50 .60 .70 .80 .85 .90 .92 .95 0.500 0.177 0.087 0.031 0.005 0.000 0.000 0.750 0.293 0.158 0.066 0.015 0.002 0.000 1.167 0.494 0.286 0.136 0.042 0.010 0.001 2.000 0.903 0.554 0.289 0.110 0.036 0.009 2.832 1.316 0.827 0.449 0.185 0.069 0.022 4.495 2.146 1.377 0.777 0.345 0.144 0.055 5.740 1.792 1.024 0.467 0.203 0.0832.768 9.444 4.631 3.035 1.769 0.837 0.386 0.171 U Average Number of Overflow Pages
  • 14.  Hashing for Non-Unique Attributes Let V denote the number of different attribute values. Then we can distinguish three cases: V ~ T: The attribute is “almost” unique; a good hash function should work in that case. V > B: There are more values than buckets. Can be made work, but some buckets may get much higher utilization than others. V < B: This is the case where hashing cannot be used.
  • 15.  buckets overflow pages buckets (Implicit) pointers establishedbythe overflowfunction.In this example the overflowbucket for bucket bis b+3mod B. (Completelyfilledbuckets andpages are shaded.) a) external overflowhandling b) internal overflowhandling Overflow Handling
  • 16.  Hashing: Summary For unique attributes, hashing can yield one- access retrieval. It is critical to find a good hash function to reduce collisions. If the original estimate of the file size is wrong, reorganization is inevitable. Synchronization at the page level is done using standard crabbing techniques. Hashing does not support range queries.
  • 17.  B-Trees B-Trees consist of two types of nodes: Leaf nodes: The contain the data, i.e. the tuples or pointers to the tuples (TIDs). Index nodes: Index nodes contain reference keys to direct the search towards the leaves. The data structure looks like this: struct { char * KeyValue; PAGEID PointerToNextNode; } index_node_structure[ ];
  • 18.  Rules for Index Nodes Key values are in sorted order: K0 ≤ K1 ≤ ... ≤ Ki ≤ ... ≤ Kf (f is max. capacity of a node). For any two adjacent key values Ki, Ki+1 the pointer Pi points to a node covering all values in the interval (Ki, Ki+1]. If a search for value v arrives at an index node, the next node to be visited is pointed to by Pi such that Ki ≤ v < Ki+1. K0 is an arbitrary “low value” (smaller than anything else in that node).
  • 19.  Properties of a B-Tree Parameter f is called the fan-out of the tree. The number of nodes visited from the root to a leaf is called the height of the tree. A B-tree is always perfectly balanced, i.e. the height is the same for all leaves. Storage utilization is at least 50% for all nodes except for the root. Average storage utilization is close to 70%.
  • 20.  120 136 140 151 220 255 256 271 296 299 312 318 140 168 170 190 271 312 220 168 root node 303 leaf node L1 leaf node L2 leaf node L3 index node I The shaded parts at the beginning of the index nodes represent the low value key (K0). A Simple B-Tree
  • 21.  Some Observations B-trees also work for non-unique attributes; implementational optimizations will be discussed later on. The reference keys in the index nodes can be different from all “real” key values in the leaves; they only have to guide the search correctly. The key values at the leaf level are sorted in ascending order; this supports range queries.
  • 22.  255 256 271 296 299 312 318 271 312 303 leaf node L1 leaf node L2 leaf node L3 index node I key value 280 271 312 index node I L1 299 L3 271 296 leaf node L2 299 303 leaf node Ln key value 280 before split of index node I after split of index node I insert insert Inserting Into a B-Tree
  • 23.  Growing a B-Tree If the insert leaf is full, allocate a new node, distribute the values (the new one sorted in place) evenly across the old leaf and the new node, move the lowest key value of the new node up to the index node. If that index node is full, split it in the same way. If the root has to be split: Allocate two new nodes, distribute the key values evenly over them, put the reference key in the root.
  • 24.  Deleting Tuples From a B-Tree To maintain the space utilization guarantees, a leaf that becomes under-utilized (< 50%) would have to be merged with its neighbours. This is a very costly operation; in particular, synchronization at the page level is very complicated. Therefore, most systems let nodes become empty and discard them when that happens. Analyses show that this does not deteriorate the overall B-tree performance.
  • 25.  23 P1 P2 P3 28 P4 29 P5 keyvalue pointer list 3 1 2 P6 23 R1 R2 R328 29 keyvalue reference P1 P2 P3 P4 P5 P6 23 P1 23 P2 23 P3 28 P4 29 P5 keyvalue pointer 29 P6 Non-Unique Attributes
  • 26.  F The Basic Formula of B-Tree-Performance With the N: number of tuples, C: average number of entries in a leaf, F: average number of entries in an index node, the height H of a B- tree is H = 1 + log ( N/C)
  • 27. Key-sequenced file (C* = 43) Secondary index (C* = 300) H N(max) increase N(max) increase 2 3 4 5 12,900 3,870,000 1,161,000,000 348,300,000,000 90,000 27,000,000 8,100,000,000 2,430,000,000,000 - - 3,857,100 1,157,130,000 347,139,000,000 26,910,000 8,073,000,000 2,421,900,000,000 Some Performance Figures
  • 28.  Tuples in the Leaves? Assuming a tuple is x times longer than a TID, we get the following estimate: 1 + logF (N/(x•C)) + 1.1 £ 1 + logF (N/C). This transforms into 1.1 £ logF x When this holds, moving the tuples out of the leaves improves performance.
  • 29.  Bertolucci Copelletti Gambogi. . . Cooperativa Bert Cop Gam. . . Cooperativa. . . Copelletti. . . Copelletti. . . a) Fullkey values stored in all nodes of the B*-tree b) Suffix compression for key values in high-levelnodes Key Compression: Suffix Compression
  • 30.  Synchronization on B-Trees: What Is the Problem? B-Trees are fully redundant structures, which can be reconstructed from the tuples; therefore, no synchronization should be required at all. However, some queries operate on the index only. This requires all operations on B-trees to be serializable with the operations on the tuples. Standard two-phase locking with the nodes as the objects is not feasible for performance reasons.
  • 31.  Node Q at level i Node R at level i+1 search path 1. semaphore on Q 2. follow search path 3. semaphore on R 4. release sem. on Q This is an example of the crabbing technique Protecting Tree Traversal
  • 32.  B*-tree 14 P1 P2 16 P3 17 P4 P5 P6 20 P7 22 P8 P9 P10 B-Trees and Value Locks
  • 33.  Making Lock Names To implement value locking, we need to build lock names according to the following rule: LockN := {TableName, IndexName, KeyValue}. KeyValue in turn is a composite: KeyValue := {AttributeValue, TupleIdentifier}.
  • 34.  Key Range Locking an B-Trees Simple retrieval: k = c Get a semaphore on leaf page; get S-lock on key range defined by largest existing value c1 with c1 ≤ c; hold lock until commit.
  • 35.  Key Range Locking an B-Trees Range retrieval: c1 < k < c2 Get s semaphore on first leaf page; get S-lock on key defined by largest existing value c3 with c3 < c1; proceed sequentially along leaf level; request key range S-lock for each new attribute value up to and including c2; do careful crabbing across leaf pages; hold S-lock until commit.
  • 36.  Key Range Locking an B-Trees Insert: [c, ki] Get X-semaphore on leaf page; find largest existing value c1 with c1 < c; request instant IX-lock on c1 ; request long X-lock on c.
  • 37.  Delete: [c, kd] Get X-semaphore on leaf page; find largest existing value c1 with c1 < c; request long IX-lock on c; else request long X-lock on c and c1. Key Range Locking an B-Trees
  • 38.  Pa Transaction T1 inserts into page pa, causing a split T1 Pa Pa' Transaction T2 inserts into page pa' Pi Pi B-Tree Recovery
  • 39.  B-Tree Recovery Based on Physiological Logging Cover all B-tree operations with semaphores on all affected pages. For each logical update a log record with the logical UNDO operation must be moved to the log While the update operation is being performed, physical REDO log records are written. After all REDO records are safely in the log, the exclusive semaphores can be released.
  • 40.  The Two Phases of B-Tree-Recovery Phase1: Go forward through the log up to its current end, applying all REDO records to the tree. Phase2: Go backward to the Begin of transaction record of the oldest incomplete transaction, executing the UNDO operations on the tree for all losers along the way.
  • 41.  key value ¥ the magic of hashing pointer table (directory) buckets (pages) Other Access Path Methods: Extendible Hashing
  • 42.  New Techniques Grid files: Symmetric multi-dimensional point access. Can become very unbalanced depending on correlation in the data. R-Trees: Symmetric multi-dimensional access. Can deteriorate depending on insertion strategy. hb-Tree: Symmetric multi-dimensional access. Can turn into a DAG depending on deletion order.