SlideShare a Scribd company logo
1
3/6/2024
Indexing
2
3/6/2024
Index Concept
Main idea: A separate data structure used to
locate records
Many, many, many, many flavors of index
organization have been proposed and tried
including structures which combine hashing and
indexing
Various buffering schemes are somewhat orthogonal
We’ll focus on
General concepts [chapter 4.3-4.4]
ISAM (indexed sequential) [ch. 5.1]
B-trees and B+ trees [ch.5.2-5.8]
3
3/6/2024
Index Terminology
Most generally, index is a list of value/address
pairs
Each pair is an index “entry”
Value is the index “key”
Address will point to a data record, or to a data page
There might be many records on a page
The assumption is that the value/address pair will be
much smaller in size than the full record
If index is small, a copy can be maintained in
memory!
Permanent disk copy is still needed
4
3/6/2024
Key Terminology
Index key field
Not necessarily the same as the primary DB key of the
table!
But called a “key” anyway
Primary index
Key is the primary (DB) key
Only one index per file
Secondary index
Key is not the primary DB key
Could be many indices per file (or none)
5
3/6/2024
More Indexing Terminology
Dense index
One index entry for each record (or page)
Non-dense or sparse index
Less than one index entry for each record
Inverted file:
File which has a dense secondary index
Clustering index
Preserves locality: close index entries refer to close data
records
Multilevel indexing
each level is an index to the next level down
6
3/6/2024
Indexing Pitfalls
Index itself is a file
Occupies disk space
Must worry about maintenance, consistency, recovery,
etc.
Large indices won't fit in memory
May require multiple seeks to locate record entry
7
3/6/2024
Desiderata for Multilevel Indexes
Should support efficient random access
Should also support efficient sequential access, if
possible
Should have low height
Should be efficiently updatable
Should be storage-efficient
Top level(s) should fit in memory
8
3/6/2024
ISAM
= Indexed Sequential Access Method
IBM terminology
“Indexed Sequential” more general term (non-IBM)
ISAM as described in textbook (5.1) is very close to B+
tree
simpler versions exist
Main idea: maintain sequential file but give it
an index
Sequentiality for efficient “batch” processing
Index for random record access
9
3/6/2024
ISAM Technique
Build a dense index of the pages (1st level index)
Sparse from a record viewpoint
Then build an index of the 1st level index (2nd
level index)
Continue recursively until top level index fits on 1
page
Some implementations may stop after a fixed # of
levels
10
3/6/2024
Updating an ISAM File
Data set must be kept sequential
So that it can be processed without the index
May have to rewrite entire file to add records
Could use overflow pages
chained together or in fixed locations (overflow area)
Index is usually NOT updated as records are
added!
Once in a while the whole thing is “reorganized”
Data pages recopied to eliminate overflows
Index recreated
11
3/6/2024
ISAM Pros, Cons
Pro
Relatively simple
Great for true sequential access
Cons
Not very dynamic
Inefficient if lots of overflow pages
Can only be one ISAM index per file
12
3/6/2024
B-Tree
B-Tree is a type of multilevel index
from another standpoint: it's a type of balanced tree
Invented in 1972 by Boeing engineers R. Bayer
and E. McCreight
By 1979: "the standard organization for indexes in
a database system" (Comer)
13
3/6/2024
B-Tree Overview
Assume for now that keys are fixed-length and
unique
A B-tree can be thought of as a generalized
binary search tree
multiple branches rather than just L or R
Trees are always perfectly balanced
Some wasted space in the nodes is tolerated
14
3/6/2024
B-Tree Concepts
Each node contains
tree (index node) pointers, and
key values (with record or page pointers)
Given a key K and the two node pointers L and R
around it
All key values pointed to by L are < K
All key values pointed to by R are > K
“Order p” means (up to) p tree pointers, (up to) p-
1 keys
Terminology differs between authors
15
3/6/2024
B+ Tree vs. B-tree
Textbook only discusses B+ trees
So do we from now on
Two big differences:
Original B-trees had record pointers in all of the index
nodes; B+ trees only in leaf nodes
Given a key K and the two node pointers L and R around
it
• All key values pointed to by L are < K
• All key values pointed to by R are >= K
B+ tree data pages are linked together to form a
sequential file
Gives the advantages of ISAM
In our book, it’s a doubly-linked list
16
3/6/2024
Alternate Views of the Leaf Nodes
[cf. Chapter 4.3.1]
Leaf nodes might be actual data pages
Leaf nodes might contain pointers to the actual data
records or pages
For B+ trees, this implies the leaf node format is
different from the non-leaf node format
may hold different number of entries
The leaf nodes can be chained together,
regardless of whether the actual data pages are!
17
3/6/2024
B+Tree Growth and Change
The big idea: When a node is full, it splits.
middle value is propagated upward
If we’re lucky, there’s room for it in the level above
two new nodes are at same level as original node
Height of tree increases only when the root splits
A very nice property
This is what keeps the tree perfectly balanced
Recommended: split only “on the way down”
On deletion: two adjacent nodes recombine if both
are < half full
18
3/6/2024
Variations
Could redistribute records between adjacent
blocks
esp. on deletion (B* tree)
Variable order: accommodate varying key lengths
Could store the whole record in the index block
especially if records are few and small
in a B+ tree, this would make sequential access
especially efficient
19
3/6/2024
B+ Trees with Other Indices
Suppose you have a B+ tree for the file
Leaf nodes of the index are the actual pages of the file,
doubly linked together for sequential access
Suppose you have some secondary indices
What happens when a B+ tree node splits or
merges???
20
3/6/2024
Other Forms of Indexing
Bitmap indexes
One index per value (property) of interest
One bit per record
TRUE if record has a particular property
Indexed hash: hash function takes you to an entry
in an index
allows physical record locations to change
Clever indexing schemes are useful in optimizing
complex queries

More Related Content

PPT
Ardbms
PPT
Database Management Systems full lecture
PPT
Database Management Systems full lecture
PPT
3620121datastructures.ppt
PPTX
Lec 1 indexing and hashing
PPT
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
PPT
StorageIndexing_CS541.ppt indexes for dtata bae
PPT
INDEXING METHODS USED IN DATABASE STORAGE
Ardbms
Database Management Systems full lecture
Database Management Systems full lecture
3620121datastructures.ppt
Lec 1 indexing and hashing
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_CS541.ppt indexes for dtata bae
INDEXING METHODS USED IN DATABASE STORAGE

Similar to exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh (20)

PPTX
Indexing structure for files
PDF
indexingstructureforfiles-160728120658.pdf
PPTX
Index Structures.pptx
PPT
Storage struct
PPTX
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
PPT
Data indexing presentation
PPT
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
PPT
Tree-structured indexes lectures for student.ppt
PPT
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PPT
Indexing.ppt
PDF
Database management system session 6
PPT
Unit08 dbms
PPT
Indexing_DATA STRUCTURE FOR ENGINEERING STUDENTS ppt
PPT
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
PDF
Indexing techniques
PPTX
DMBS Indexes.pptx
DOCX
b tree file system report
PDF
DOC-20240804-WA0006..pdforaclesqlindexing
PDF
DOC-20240804-WA0006..pdforaclesqlindexing
Indexing structure for files
indexingstructureforfiles-160728120658.pdf
Index Structures.pptx
Storage struct
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
Data indexing presentation
9910559 jjjgjgjfs lke lwmerfml lew we.ppt
Tree-structured indexes lectures for student.ppt
Indexing.ppt mmmmmmmmmmmmmmmmmmmmmmmmmmmmm
Indexing.ppt
Database management system session 6
Unit08 dbms
Indexing_DATA STRUCTURE FOR ENGINEERING STUDENTS ppt
Indexing.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Indexing techniques
DMBS Indexes.pptx
b tree file system report
DOC-20240804-WA0006..pdforaclesqlindexing
DOC-20240804-WA0006..pdforaclesqlindexing
Ad

More from RAtna29 (20)

PPT
RedBlackTrees_2.pptNNNNNNNNNNNNNNNNNNNNNN
PPT
6Sorting.pptBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
PPTX
statisticsforsupportslides.pptxnnnnnnnnnnnnnnnnnn
PPT
Gerstman_PP09.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
PPT
chapter8.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
PPTX
red black tree.pptxMMMMMMMMMMMMMMMMMMMMMMMMMM
PPTX
Unit 5 m way tree.pptxMMMMMMMMMMMMMMMMMMM
PPTX
TF_IDF_PMI_Jurafsky.pptxnnnnnnnnnnnnnnnn
PPTX
13-DependencyParsing.pptxnnnnnnnnnnnnnnnnnnn
PPT
pos-tagging.pptbbbbbbbbbbbbbbbbbbbbnnnnnnnnnn
PPT
lecture_15.pptffffffffffffffffffffffffff
PPT
6640200.pptNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
PPT
Chapter 4.pptmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
PPT
cse220lec4.pptnnnnnnnnnnnnnnnnnnnnnnnnnnn
PPT
slp05.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnmmmmmmmmm
PPTX
lecture14-distributed-reprennnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnsentations.pptx
PPTX
lecture2-intro-boolean.pptbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbx
PPT
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
PPT
lecture3-indexconstruction.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
RedBlackTrees_2.pptNNNNNNNNNNNNNNNNNNNNNN
6Sorting.pptBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
statisticsforsupportslides.pptxnnnnnnnnnnnnnnnnnn
Gerstman_PP09.pptvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
chapter8.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
MLT_KCS055 (Unit-2 Notes).pdfNNNNNNNNNNNNNNNN
red black tree.pptxMMMMMMMMMMMMMMMMMMMMMMMMMM
Unit 5 m way tree.pptxMMMMMMMMMMMMMMMMMMM
TF_IDF_PMI_Jurafsky.pptxnnnnnnnnnnnnnnnn
13-DependencyParsing.pptxnnnnnnnnnnnnnnnnnnn
pos-tagging.pptbbbbbbbbbbbbbbbbbbbbnnnnnnnnnn
lecture_15.pptffffffffffffffffffffffffff
6640200.pptNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
Chapter 4.pptmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
cse220lec4.pptnnnnnnnnnnnnnnnnnnnnnnnnnnn
slp05.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnmmmmmmmmm
lecture14-distributed-reprennnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnsentations.pptx
lecture2-intro-boolean.pptbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbx
lecture10-efficient-scoring.ppmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmt
lecture3-indexconstruction.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Ad

Recently uploaded (20)

PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
737-MAX_SRG.pdf student reference guides
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Artificial Intelligence
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
OOP with Java - Java Introduction (Basics)
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
CH1 Production IntroductoryConcepts.pptx
737-MAX_SRG.pdf student reference guides
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Fundamentals of safety and accident prevention -final (1).pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf
bas. eng. economics group 4 presentation 1.pptx
Construction Project Organization Group 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
III.4.1.2_The_Space_Environment.p pdffdf
Mechanical Engineering MATERIALS Selection
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Artificial Intelligence
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
OOP with Java - Java Introduction (Basics)

exing.ppt hhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

  • 2. 2 3/6/2024 Index Concept Main idea: A separate data structure used to locate records Many, many, many, many flavors of index organization have been proposed and tried including structures which combine hashing and indexing Various buffering schemes are somewhat orthogonal We’ll focus on General concepts [chapter 4.3-4.4] ISAM (indexed sequential) [ch. 5.1] B-trees and B+ trees [ch.5.2-5.8]
  • 3. 3 3/6/2024 Index Terminology Most generally, index is a list of value/address pairs Each pair is an index “entry” Value is the index “key” Address will point to a data record, or to a data page There might be many records on a page The assumption is that the value/address pair will be much smaller in size than the full record If index is small, a copy can be maintained in memory! Permanent disk copy is still needed
  • 4. 4 3/6/2024 Key Terminology Index key field Not necessarily the same as the primary DB key of the table! But called a “key” anyway Primary index Key is the primary (DB) key Only one index per file Secondary index Key is not the primary DB key Could be many indices per file (or none)
  • 5. 5 3/6/2024 More Indexing Terminology Dense index One index entry for each record (or page) Non-dense or sparse index Less than one index entry for each record Inverted file: File which has a dense secondary index Clustering index Preserves locality: close index entries refer to close data records Multilevel indexing each level is an index to the next level down
  • 6. 6 3/6/2024 Indexing Pitfalls Index itself is a file Occupies disk space Must worry about maintenance, consistency, recovery, etc. Large indices won't fit in memory May require multiple seeks to locate record entry
  • 7. 7 3/6/2024 Desiderata for Multilevel Indexes Should support efficient random access Should also support efficient sequential access, if possible Should have low height Should be efficiently updatable Should be storage-efficient Top level(s) should fit in memory
  • 8. 8 3/6/2024 ISAM = Indexed Sequential Access Method IBM terminology “Indexed Sequential” more general term (non-IBM) ISAM as described in textbook (5.1) is very close to B+ tree simpler versions exist Main idea: maintain sequential file but give it an index Sequentiality for efficient “batch” processing Index for random record access
  • 9. 9 3/6/2024 ISAM Technique Build a dense index of the pages (1st level index) Sparse from a record viewpoint Then build an index of the 1st level index (2nd level index) Continue recursively until top level index fits on 1 page Some implementations may stop after a fixed # of levels
  • 10. 10 3/6/2024 Updating an ISAM File Data set must be kept sequential So that it can be processed without the index May have to rewrite entire file to add records Could use overflow pages chained together or in fixed locations (overflow area) Index is usually NOT updated as records are added! Once in a while the whole thing is “reorganized” Data pages recopied to eliminate overflows Index recreated
  • 11. 11 3/6/2024 ISAM Pros, Cons Pro Relatively simple Great for true sequential access Cons Not very dynamic Inefficient if lots of overflow pages Can only be one ISAM index per file
  • 12. 12 3/6/2024 B-Tree B-Tree is a type of multilevel index from another standpoint: it's a type of balanced tree Invented in 1972 by Boeing engineers R. Bayer and E. McCreight By 1979: "the standard organization for indexes in a database system" (Comer)
  • 13. 13 3/6/2024 B-Tree Overview Assume for now that keys are fixed-length and unique A B-tree can be thought of as a generalized binary search tree multiple branches rather than just L or R Trees are always perfectly balanced Some wasted space in the nodes is tolerated
  • 14. 14 3/6/2024 B-Tree Concepts Each node contains tree (index node) pointers, and key values (with record or page pointers) Given a key K and the two node pointers L and R around it All key values pointed to by L are < K All key values pointed to by R are > K “Order p” means (up to) p tree pointers, (up to) p- 1 keys Terminology differs between authors
  • 15. 15 3/6/2024 B+ Tree vs. B-tree Textbook only discusses B+ trees So do we from now on Two big differences: Original B-trees had record pointers in all of the index nodes; B+ trees only in leaf nodes Given a key K and the two node pointers L and R around it • All key values pointed to by L are < K • All key values pointed to by R are >= K B+ tree data pages are linked together to form a sequential file Gives the advantages of ISAM In our book, it’s a doubly-linked list
  • 16. 16 3/6/2024 Alternate Views of the Leaf Nodes [cf. Chapter 4.3.1] Leaf nodes might be actual data pages Leaf nodes might contain pointers to the actual data records or pages For B+ trees, this implies the leaf node format is different from the non-leaf node format may hold different number of entries The leaf nodes can be chained together, regardless of whether the actual data pages are!
  • 17. 17 3/6/2024 B+Tree Growth and Change The big idea: When a node is full, it splits. middle value is propagated upward If we’re lucky, there’s room for it in the level above two new nodes are at same level as original node Height of tree increases only when the root splits A very nice property This is what keeps the tree perfectly balanced Recommended: split only “on the way down” On deletion: two adjacent nodes recombine if both are < half full
  • 18. 18 3/6/2024 Variations Could redistribute records between adjacent blocks esp. on deletion (B* tree) Variable order: accommodate varying key lengths Could store the whole record in the index block especially if records are few and small in a B+ tree, this would make sequential access especially efficient
  • 19. 19 3/6/2024 B+ Trees with Other Indices Suppose you have a B+ tree for the file Leaf nodes of the index are the actual pages of the file, doubly linked together for sequential access Suppose you have some secondary indices What happens when a B+ tree node splits or merges???
  • 20. 20 3/6/2024 Other Forms of Indexing Bitmap indexes One index per value (property) of interest One bit per record TRUE if record has a particular property Indexed hash: hash function takes you to an entry in an index allows physical record locations to change Clever indexing schemes are useful in optimizing complex queries