SlideShare a Scribd company logo
RELATIONAL DATABASE MANAGEMENT SYSTEM
UNIT – III
DATA STORAGE AND INDEXING
1
Storage And File Structure
Why do we need to know about storage/file structure
Many database technologies are developed to utilize the
Storage architecture/hierarchy
Data in the database needs to be organized and
stored/retrieved efficiently
Storage Hierarchy
Magnetic Tape
Optical Disk
Magnetic Disk
Flash Memory
Cache
unit priceMemory
Volatile
primary storage
Non-volatile speed
Secondary
storage
Tertiary
storage
Primary Storage (Volatile)
Cache
Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds)
Capacity:
A typical PC level 2 cache 64KB-2 MB.
Within processors, level 1 cache usually ranges in size from 8
KB to 64 KB.
Main memory
Speed: 10s to 100s of nanoseconds;
Capacity:
Up to a few Gigabytes widely used currently
per-byte costs have decreased roughly factor of 2 every 2 3
years)
Secondary Storage (Non-volatile)
Flash memory
Speed: Read speed similar to main memory, write is much slower
Capacity: 32M to 512M currently
Forms: SmartMedia, memory stick, secure digital, BIOS
Cost: roughly same as main memory
Magnetic-disk
Capacities: up to roughly 100 GB currently Growing constantly
and rapidly with technology improvements.
1/14/2005
Yan Huang - CSCI5330 Database Implementation –
Storage and File Structure
Tertiary Storage (Non-volatile)
 Optical storage
 CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular
forms
 CD-RW, DVD-RW, and DVD-RAM
 Reads and writes are slower than with magnetic disk
 Juke-box systems, with large numbers of removable disks,
a few drives, and a mechanism for automatic
loading/unloading of disks available for storing large
volumes of data
Indexing:
*Indexing in database systems is similar to what we see
in books.
* Indexing is a data structure technique to efficiently
retrieve records from the database files based on some
attributes on which the indexing has been done.
*Indexing is defined based on its indexing attributes.
Indexing can be of the following types
*Primary Index - Primary index is defined on an
ordered data file. The data file is ordered on a key field. The
key field is generally the primary key of the relation.
Indexing Types:
Secondary Index - Secondary index may be
generated from a field which is a candidate key and has a
unique value in every record, or a non-key with duplicate
values.
Clustering Index - Clustering index is defined on an
ordered data file. The data file is ordered on a non-key field.
Ordered Indexing.
Dense Index - In dense index, there is an index
record for every search key value in the database. This makes
searching faster but requires more space to store index
records itself. Index records contain search key value and a
pointer to the actual record on the disk.
Data storage and indexing
Sparse Index:
In sparse index, index records are not created for
every search key. An index record here contains a search key
and an actual pointer to the data on the disk.
To search a record, we first proceed by index record
and reach at the actual location of the data.
If the data we are looking for is not where we
directly reach by following the index, then the system starts
sequential search until the desired data is found.
Data storage and indexing
Multilevels Indexing:
Index records comprise search-key values and data pointers.
Multilevel index is stored on the disk along with the actual
database files.
As the size of the database grows, so does the size of the
indices.
If single-level index is used, then a large size index cannot be
kept in memory which leads to multiple disk accesses .
Multi-level Index helps in breaking down the index into
several smaller indices in order to make the outermost level so small
that it can be saved in a single disk block,
Data storage and indexing
Disk:
Hard disk drives are the most common secondary storage
devices in present computer systems. These are called magnetic disks
because they use the concept of magnetization to store information.
Hard disks are formatted in a well-defined order to store data
efficiently. A hard disk plate has many concentric circles on it,
called tracks. Every track is further divided into sectors. A sector on
a hard disk typically stores 512 bytes of data.
Disk
Disk Subsystem
Disk interface standards families
• ATA (AT adaptor) range of standards
• SCSI (Small Computer System Interconnect) range
of standards.
Disk Speed
Seek time
(milliseconds)
Rotation time/latency
milliseconds
Data-transfer rate
(4-8MB/sec)
Typical numbers:
 16,000 tracks per platter
 sectors per track: 200 – 400
 512 bytes per sector
 4-16KB per block
 5,400 - 15,000 r p m
Access time = seek time + latency
Discuss ways to improve disk
reading speed
Redundant Array of Independent Disks
RAID or Redundant Array of Independent Disks,
is a technology to connect multiple secondary storage
devices and use them as a single storage media.
RAID consists of an array of disks in which
multiple disks are connected together to achieve different
goals. RAID levels define the use of disk arrays.
RAID 0
In this level, a striped array of disks is implemented. The
data is broken down into blocks and the blocks are distributed
among disks. Each disk receives a block of data to write/read in
parallel. It enhances the speed and performance of the storage
device. There is no parity and backup in Level 0.
RAID1
RAID 1 uses mirroring techniques. When data is sent to
a RAID controller, it sends a copy of data to all the disks in the
array. RAID level 1 is also called mirroring and provides
100% redundancy in case of a failure.
RAID 2
RAID 2 records Error Correction Code using Hamming
distance for its data, striped on different disks. Like level 0, each
data bit in a word is recorded on a separate disk and ECC codes of
the data words are stored on a different set disks. Due to its
complex structure and high cost, RAID 2 is not commercially
available.
RAID3
RAID 3 stripes the data onto multiple disks. The parity
bit generated for data word is stored on a different disk. This
technique makes it to overcome single disk failures.
RAID 4
In this level, an entire block of data is written onto data
disks and then the parity is generated and stored on a different
disk. Note that level 3 uses byte-level striping, whereas level 4
uses block-level striping. Both level 3 and level 4 require at
least three disks to implement RAID.
RAID 5
RAID 5 writes whole data blocks onto different disks, but
the parity bits generated for data block stripe are distributed
among all the data disks rather than storing them on a different
dedicated disk.
RAID 6
RAID 6 is an extension of level 5. In this level, two
independent parities are generated and stored in distributed fashion
among multiple disks. Two parities provide additional fault
tolerance. This level requires at least four disk drives to implement
RAID.
File Organization
File Organization defines how file records are mapped onto
disk blocks. We have four types of File Organization to organize
file records
Heap File Organization
When a file is created using Heap File Organization, the
Operating System allocates memory area to that file without
any further accounting details.
File records can be placed anywhere in that memory
area.
It is the responsibility of the software to manage the
records.
Heap File does not support any ordering, sequencing, or
indexing on its own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely
identify that record.
In sequential file organization, records are placed in the file
in some sequential order based on the unique key field or search
key.
Practically, it is not possible to store all the records
sequentially in physical form.
Hash File Organization
Hash File Organization uses Hash function computation
on some fields of the records.
The output of the hash function determines the location
of disk block where the records are to be placed.
Clustered File Organization
Clustered file organization is not considered good
for large databases. In this mechanism, related records from
one or more relations are kept in the same disk block, that
is, the ordering of records is not based on primary key or
search key.
Hashing:
Hashing uses hash functions with search keys as
parameters to generate the address of a data record.
Bucket A hash file stores data in bucket
format. Bucket is considered a unit of storage. A bucket
typically stores one complete disk block, which in turn
can store one or more records.
A hash function, h, is a mapping function that
maps all the set of search-keys K to the address where
actual records are placed. It is a function from search
keys to bucket addresses.
B+ Tree:
B+ tree is a (key, value) storage method in a tree like
structure. B+ tree has one root, any number of intermediary
nodes (usually one) and a leaf node. Here all leaf nodes will
have the actual records stored. Intermediary nodes will have
only pointers to the leaf nodes; it not has any data. Any node
will have only two leaves. This is the basic of any B+ tree.
STRUCTURE OF B+ TREE
 A B+ tree index is a multilevel indexes , but it has a structure that differs from than of
the multilevel index-sequential file.
 The bucket structure is used only if the search key does not from a primary key and if
the file is not sorted in the search key value in the order.
QUERIES ON B+ TREE
 Process queries using a b+ tree . To find all the records with a search-key
value of k.
 Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with
n=5).
 Non-leaf nodes other than root must have between 3 and 5
children([(n/2)]and n with n=5).
 Root must have at least 2 children.
UPADATES ON B+ TREES
INSERTION
If the search key value already appears in the leaf node , we add the new
record to the file and , if necessary , a pointer to the bucket.
DELETION
Using same technique as for lookup , we find the record to be deleted and
remove it from the file . The search key value is removed from the leaf node
if there is no bucket associated with that search key value or if the bucket
becomes empty as a result of the deletion.
B+TREE FILE ORGANIZATION
In a B+ tree file organization , the leaf nodes of the tree store records
instead of storing pointers to records . An example of a B+ tree file
organization . Since records are usually larger than pointers , the maximum
number of records that can be stored in the leaf nodes is less than the
number of pointers in a non leaf node.
MAIN GOAL OF B+ TREE IS:
 Sorted Intermediary and leaf nodes
Since it is a balanced tree, all nodes should be sorted.
 Fast traversal and Quick Search
Any record should be fetched very quickly. This is made by maintaining the
balance in the tree and keeping all the nodes at same distance
 No overflow pages
B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have
some percentage defined while designing a B+ tree. In our example above,
intermediary node with 108 is underflow. And leaf nodes are not partially filled,
hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow
except root node.
Definition of a B-tree
A B-tree of order m is an m-way tree (i.e., a tree where each node
may have up to m children) in which:
The number of keys in each non-leaf node is one less than the
number of its children and these keys partition the keys in the
children in the fashion of a search tree.
All leaves are on the same level.
All non-leaf nodes except the root have at least m / 2 children.
The root is either a leaf node, or it has from two to m children
a leaf node contains no more than m – 1 keys.
The number m should always be odd.
An example B-Tree
B-Trees 41
51 6242
6 12
26
55 60 7064 9045
1 2 4 7 8 13 15 18 25
27 29 46 48 53
A B-tree of order 5
containing 26 items
Note that all the leaves are at the same level
Constructing a B-tree
 Suppose we start with an empty B-tree and keys arrive in the
following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68
3 26 29 53 55 45
 We want to construct a B-tree of order 5
 The first four items go into the root:
 To put the fifth item in the root would violate condition 5
 Therefore, when 25 arrives, pick the middle key to make a new
root
B-Trees 42
1 2 8 12
Inserting into a B-Tree
 Attempt to insert the new key into a leaf
 If this would result in that leaf becoming too big, split the leaf
into two, promoting the middle key to the leaf’s parent
 If this would result in the parent becoming too big, split the
parent into two, promoting the middle key
 This strategy might have to be repeated all the way to the top
 If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees 43
Removal from a B-tree
 During insertion, the key always goes into a leaf. For deletion
we wish to remove from a leaf. There are three possible ways
we can do this:
 1 - If the key is already in a leaf node, and removing it doesn’t
cause that leaf node to have too few keys, then simply remove
the key to be deleted.
 2 - If the key is not in a leaf then it is guaranteed (by the nature
of a B-tree) that its predecessor or successor will be in a leaf --
in this case we can delete the key and promote the predecessor
or successor key to the non-leaf deleted key’s position.
B-Trees 44
Analysis of B-Trees
 The maximum number of items in a B-tree of order m and
height h:
root m – 1
level 1 m(m – 1)
level 2 m2(m – 1)
. . .
level h mh(m – 1)
 So, the total number of items is
(1 + m + m2 + m3 + … + mh)(m – 1) =
[(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1
 When m = 5 and h = 2 this gives 53 – 1 = 124
B-Trees 45
Static Hashing:
In static hashing, when a search-key value is provided, the
hash function always computes the same address.
For example, if mod-4 hash function is used, then it shall
generate only 5 values.
The output address shall always be same for that function.
The number of buckets provided remains unchanged at all times
Operation
When a record is required to be entered using static hash,
the hash function h computes the bucket address for search key K,
where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be
retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is
stored.
Delete − This is simply a search followed by a
deletion operation.
Data storage and indexing
Dynamic Hashing
The problem with static hashing is that it does not
expand or shrink dynamically as the size of the database grows
or shrinks.
Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and ondemand.
Dynamic hashing is also known as extended hashing.
Hash function, in dynamic hashing, is made to produce
a large number of values and only a few are used initially.
Data storage and indexing
Multiple-Key Access
Use multiple indices for certain types of queries.
Example:
select ID
from instructor
where dept_name = “Finance” and salary = 80000
Possible strategies for processing query using indices on
single attributes:
Multiple Key Access
1. Use index on dept_name to find instructors with
department name Finance; test salary = 80000
2. Use index on salary to find instructors with a salary
of $80000; test dept_name = “Finance”.
3. Use dept_name index to find pointers to all records
pertaining to the “Finance” department.
Similarly use index on salary. Take
intersection of both sets of pointers obtained.
Data storage and indexing

More Related Content

PPTX
Exception handling in java
PPTX
An introduction to reinforcement learning
PPTX
Starting a code Club proposal
PPTX
Nepal Earthquake - April 2015
PDF
Classification Based Machine Learning Algorithms
PDF
PostgreSQL Deep Internal
PPTX
Data partitioning
PPTX
Physical Unit Operations
Exception handling in java
An introduction to reinforcement learning
Starting a code Club proposal
Nepal Earthquake - April 2015
Classification Based Machine Learning Algorithms
PostgreSQL Deep Internal
Data partitioning
Physical Unit Operations

What's hot (20)

PPT
11. Storage and File Structure in DBMS
PPTX
Distributed database management system
PPTX
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
PPTX
Query processing and optimization (updated)
PPTX
Introduction to distributed database
PPT
15. Transactions in DBMS
PPT
Data models
PPTX
Distributed database
PPT
Query processing-and-optimization
PPT
Distributed & parallel system
PPTX
Cohesion and coupling
PPT
16. Concurrency Control in DBMS
PPTX
Distributed file system
PPTX
Challenges of Conventional Systems.pptx
PPT
Distributed Database System
PPTX
two tier and three tier
PPTX
Client server architecture
PPTX
Database System Architectures
PPTX
Data Modeling PPT
PDF
The CAP Theorem
11. Storage and File Structure in DBMS
Distributed database management system
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Query processing and optimization (updated)
Introduction to distributed database
15. Transactions in DBMS
Data models
Distributed database
Query processing-and-optimization
Distributed & parallel system
Cohesion and coupling
16. Concurrency Control in DBMS
Distributed file system
Challenges of Conventional Systems.pptx
Distributed Database System
two tier and three tier
Client server architecture
Database System Architectures
Data Modeling PPT
The CAP Theorem
Ad

Similar to Data storage and indexing (20)

PPT
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
PPT
StorageIndexing_CS541.ppt indexes for dtata bae
PPT
INDEXING METHODS USED IN DATABASE STORAGE
PPTX
normalization process in relational data base management
PPT
Unit 4 data storage and querying
PPTX
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
PPTX
DBMS-Unit5-PPT.pptx important for revision
PPT
3620121datastructures.ppt
PPTX
UNIT III.pptx
PPT
Storage struct
PPTX
CS 2212- UNIT -4.pptx
PPTX
lecture 2 notes indexing in application of database systems.pptx
PPTX
Overview of Storage and Indexing ...
PPT
Chapter13
PPT
Csci12 report aug18
PPT
Indexing and hashing
PPT
Unit 08 dbms
PPTX
overview of storage and indexing BY-Pratik kadam
PPTX
File organization and introduction of DBMS
PDF
fileorganizationandintroductionofdbms-210313163900.pdf
StorageIndexing_Main memory (RAM) for currently used data. Disk for the main ...
StorageIndexing_CS541.ppt indexes for dtata bae
INDEXING METHODS USED IN DATABASE STORAGE
normalization process in relational data base management
Unit 4 data storage and querying
DB LECTURE 4 INDEXINGS PPT NOTES.pptx
DBMS-Unit5-PPT.pptx important for revision
3620121datastructures.ppt
UNIT III.pptx
Storage struct
CS 2212- UNIT -4.pptx
lecture 2 notes indexing in application of database systems.pptx
Overview of Storage and Indexing ...
Chapter13
Csci12 report aug18
Indexing and hashing
Unit 08 dbms
overview of storage and indexing BY-Pratik kadam
File organization and introduction of DBMS
fileorganizationandintroductionofdbms-210313163900.pdf
Ad

More from pradeepa velmurugan (10)

PPT
PPT
Multimedia compression
PPTX
software design
PPTX
DIVIDE AND CONQUER
PPTX
IMAGE COMPRESSION
PPTX
File handling in input and output
PPTX
Analysis Of Attribute Revelance
PPTX
PPTX
Instruction codes
PPT
Research Methodology
Multimedia compression
software design
DIVIDE AND CONQUER
IMAGE COMPRESSION
File handling in input and output
Analysis Of Attribute Revelance
Instruction codes
Research Methodology

Recently uploaded (20)

PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Classroom Observation Tools for Teachers
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
Cell Types and Its function , kingdom of life
PPTX
Institutional Correction lecture only . . .
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
RMMM.pdf make it easy to upload and study
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
Supply Chain Operations Speaking Notes -ICLT Program
Final Presentation General Medicine 03-08-2024.pptx
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Classroom Observation Tools for Teachers
Anesthesia in Laparoscopic Surgery in India
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Cell Structure & Organelles in detailed.
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Cell Types and Its function , kingdom of life
Institutional Correction lecture only . . .
Module 4: Burden of Disease Tutorial Slides S2 2025
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Pharma ospi slides which help in ospi learning
RMMM.pdf make it easy to upload and study
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Abdominal Access Techniques with Prof. Dr. R K Mishra

Data storage and indexing

  • 1. RELATIONAL DATABASE MANAGEMENT SYSTEM UNIT – III DATA STORAGE AND INDEXING
  • 2. 1 Storage And File Structure Why do we need to know about storage/file structure Many database technologies are developed to utilize the Storage architecture/hierarchy Data in the database needs to be organized and stored/retrieved efficiently
  • 3. Storage Hierarchy Magnetic Tape Optical Disk Magnetic Disk Flash Memory Cache unit priceMemory Volatile primary storage Non-volatile speed Secondary storage Tertiary storage
  • 4. Primary Storage (Volatile) Cache Speed: 7 to 20 ns (1 nanosecond = 10–9 seconds) Capacity: A typical PC level 2 cache 64KB-2 MB. Within processors, level 1 cache usually ranges in size from 8 KB to 64 KB. Main memory Speed: 10s to 100s of nanoseconds; Capacity: Up to a few Gigabytes widely used currently per-byte costs have decreased roughly factor of 2 every 2 3 years)
  • 5. Secondary Storage (Non-volatile) Flash memory Speed: Read speed similar to main memory, write is much slower Capacity: 32M to 512M currently Forms: SmartMedia, memory stick, secure digital, BIOS Cost: roughly same as main memory Magnetic-disk Capacities: up to roughly 100 GB currently Growing constantly and rapidly with technology improvements.
  • 6. 1/14/2005 Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Tertiary Storage (Non-volatile)  Optical storage  CD-ROM (640 MB) and DVD (4.7 to 17 GB) most popular forms  CD-RW, DVD-RW, and DVD-RAM  Reads and writes are slower than with magnetic disk  Juke-box systems, with large numbers of removable disks, a few drives, and a mechanism for automatic loading/unloading of disks available for storing large volumes of data
  • 7. Indexing: *Indexing in database systems is similar to what we see in books. * Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on which the indexing has been done. *Indexing is defined based on its indexing attributes. Indexing can be of the following types *Primary Index - Primary index is defined on an ordered data file. The data file is ordered on a key field. The key field is generally the primary key of the relation.
  • 8. Indexing Types: Secondary Index - Secondary index may be generated from a field which is a candidate key and has a unique value in every record, or a non-key with duplicate values. Clustering Index - Clustering index is defined on an ordered data file. The data file is ordered on a non-key field. Ordered Indexing. Dense Index - In dense index, there is an index record for every search key value in the database. This makes searching faster but requires more space to store index records itself. Index records contain search key value and a pointer to the actual record on the disk.
  • 10. Sparse Index: In sparse index, index records are not created for every search key. An index record here contains a search key and an actual pointer to the data on the disk. To search a record, we first proceed by index record and reach at the actual location of the data. If the data we are looking for is not where we directly reach by following the index, then the system starts sequential search until the desired data is found.
  • 12. Multilevels Indexing: Index records comprise search-key values and data pointers. Multilevel index is stored on the disk along with the actual database files. As the size of the database grows, so does the size of the indices. If single-level index is used, then a large size index cannot be kept in memory which leads to multiple disk accesses . Multi-level Index helps in breaking down the index into several smaller indices in order to make the outermost level so small that it can be saved in a single disk block,
  • 14. Disk: Hard disk drives are the most common secondary storage devices in present computer systems. These are called magnetic disks because they use the concept of magnetization to store information. Hard disks are formatted in a well-defined order to store data efficiently. A hard disk plate has many concentric circles on it, called tracks. Every track is further divided into sectors. A sector on a hard disk typically stores 512 bytes of data.
  • 15. Disk
  • 16. Disk Subsystem Disk interface standards families • ATA (AT adaptor) range of standards • SCSI (Small Computer System Interconnect) range of standards.
  • 17. Disk Speed Seek time (milliseconds) Rotation time/latency milliseconds Data-transfer rate (4-8MB/sec) Typical numbers:  16,000 tracks per platter  sectors per track: 200 – 400  512 bytes per sector  4-16KB per block  5,400 - 15,000 r p m Access time = seek time + latency Discuss ways to improve disk reading speed
  • 18. Redundant Array of Independent Disks RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary storage devices and use them as a single storage media. RAID consists of an array of disks in which multiple disks are connected together to achieve different goals. RAID levels define the use of disk arrays.
  • 19. RAID 0 In this level, a striped array of disks is implemented. The data is broken down into blocks and the blocks are distributed among disks. Each disk receives a block of data to write/read in parallel. It enhances the speed and performance of the storage device. There is no parity and backup in Level 0.
  • 20. RAID1 RAID 1 uses mirroring techniques. When data is sent to a RAID controller, it sends a copy of data to all the disks in the array. RAID level 1 is also called mirroring and provides 100% redundancy in case of a failure.
  • 21. RAID 2 RAID 2 records Error Correction Code using Hamming distance for its data, striped on different disks. Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the data words are stored on a different set disks. Due to its complex structure and high cost, RAID 2 is not commercially available.
  • 22. RAID3 RAID 3 stripes the data onto multiple disks. The parity bit generated for data word is stored on a different disk. This technique makes it to overcome single disk failures.
  • 23. RAID 4 In this level, an entire block of data is written onto data disks and then the parity is generated and stored on a different disk. Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping. Both level 3 and level 4 require at least three disks to implement RAID.
  • 24. RAID 5 RAID 5 writes whole data blocks onto different disks, but the parity bits generated for data block stripe are distributed among all the data disks rather than storing them on a different dedicated disk.
  • 25. RAID 6 RAID 6 is an extension of level 5. In this level, two independent parities are generated and stored in distributed fashion among multiple disks. Two parities provide additional fault tolerance. This level requires at least four disk drives to implement RAID.
  • 26. File Organization File Organization defines how file records are mapped onto disk blocks. We have four types of File Organization to organize file records
  • 27. Heap File Organization When a file is created using Heap File Organization, the Operating System allocates memory area to that file without any further accounting details. File records can be placed anywhere in that memory area. It is the responsibility of the software to manage the records. Heap File does not support any ordering, sequencing, or indexing on its own.
  • 28. Sequential File Organization Every file record contains a data field (attribute) to uniquely identify that record. In sequential file organization, records are placed in the file in some sequential order based on the unique key field or search key. Practically, it is not possible to store all the records sequentially in physical form.
  • 29. Hash File Organization Hash File Organization uses Hash function computation on some fields of the records. The output of the hash function determines the location of disk block where the records are to be placed.
  • 30. Clustered File Organization Clustered file organization is not considered good for large databases. In this mechanism, related records from one or more relations are kept in the same disk block, that is, the ordering of records is not based on primary key or search key.
  • 31. Hashing: Hashing uses hash functions with search keys as parameters to generate the address of a data record. Bucket A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically stores one complete disk block, which in turn can store one or more records. A hash function, h, is a mapping function that maps all the set of search-keys K to the address where actual records are placed. It is a function from search keys to bucket addresses.
  • 32. B+ Tree: B+ tree is a (key, value) storage method in a tree like structure. B+ tree has one root, any number of intermediary nodes (usually one) and a leaf node. Here all leaf nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only two leaves. This is the basic of any B+ tree.
  • 33. STRUCTURE OF B+ TREE  A B+ tree index is a multilevel indexes , but it has a structure that differs from than of the multilevel index-sequential file.  The bucket structure is used only if the search key does not from a primary key and if the file is not sorted in the search key value in the order.
  • 34. QUERIES ON B+ TREE  Process queries using a b+ tree . To find all the records with a search-key value of k.
  • 35.  Leaf nodes must have between 2 and 4 values([(n-1)/2)] and n-1 , with n=5).  Non-leaf nodes other than root must have between 3 and 5 children([(n/2)]and n with n=5).  Root must have at least 2 children.
  • 36. UPADATES ON B+ TREES INSERTION If the search key value already appears in the leaf node , we add the new record to the file and , if necessary , a pointer to the bucket.
  • 37. DELETION Using same technique as for lookup , we find the record to be deleted and remove it from the file . The search key value is removed from the leaf node if there is no bucket associated with that search key value or if the bucket becomes empty as a result of the deletion.
  • 38. B+TREE FILE ORGANIZATION In a B+ tree file organization , the leaf nodes of the tree store records instead of storing pointers to records . An example of a B+ tree file organization . Since records are usually larger than pointers , the maximum number of records that can be stored in the leaf nodes is less than the number of pointers in a non leaf node.
  • 39. MAIN GOAL OF B+ TREE IS:  Sorted Intermediary and leaf nodes Since it is a balanced tree, all nodes should be sorted.  Fast traversal and Quick Search Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the nodes at same distance  No overflow pages B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while designing a B+ tree. In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an overflow. In ideal B+ tree, it should not have overflow or underflow except root node.
  • 40. Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: The number of keys in each non-leaf node is one less than the number of its children and these keys partition the keys in the children in the fashion of a search tree. All leaves are on the same level. All non-leaf nodes except the root have at least m / 2 children. The root is either a leaf node, or it has from two to m children a leaf node contains no more than m – 1 keys. The number m should always be odd.
  • 41. An example B-Tree B-Trees 41 51 6242 6 12 26 55 60 7064 9045 1 2 4 7 8 13 15 18 25 27 29 46 48 53 A B-tree of order 5 containing 26 items Note that all the leaves are at the same level
  • 42. Constructing a B-tree  Suppose we start with an empty B-tree and keys arrive in the following order:1 12 8 2 25 5 14 28 17 7 52 16 48 68 3 26 29 53 55 45  We want to construct a B-tree of order 5  The first four items go into the root:  To put the fifth item in the root would violate condition 5  Therefore, when 25 arrives, pick the middle key to make a new root B-Trees 42 1 2 8 12
  • 43. Inserting into a B-Tree  Attempt to insert the new key into a leaf  If this would result in that leaf becoming too big, split the leaf into two, promoting the middle key to the leaf’s parent  If this would result in the parent becoming too big, split the parent into two, promoting the middle key  This strategy might have to be repeated all the way to the top  If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher B-Trees 43
  • 44. Removal from a B-tree  During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this:  1 - If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted.  2 - If the key is not in a leaf then it is guaranteed (by the nature of a B-tree) that its predecessor or successor will be in a leaf -- in this case we can delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position. B-Trees 44
  • 45. Analysis of B-Trees  The maximum number of items in a B-tree of order m and height h: root m – 1 level 1 m(m – 1) level 2 m2(m – 1) . . . level h mh(m – 1)  So, the total number of items is (1 + m + m2 + m3 + … + mh)(m – 1) = [(mh+1 – 1)/ (m – 1)] (m – 1) = mh+1 – 1  When m = 5 and h = 2 this gives 53 – 1 = 124 B-Trees 45
  • 46. Static Hashing: In static hashing, when a search-key value is provided, the hash function always computes the same address. For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall always be same for that function. The number of buckets provided remains unchanged at all times Operation When a record is required to be entered using static hash, the hash function h computes the bucket address for search key K, where the record will be stored.
  • 47. Bucket address = h(K) Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of the bucket where the data is stored. Delete − This is simply a search followed by a deletion operation.
  • 49. Dynamic Hashing The problem with static hashing is that it does not expand or shrink dynamically as the size of the database grows or shrinks. Dynamic hashing provides a mechanism in which data buckets are added and removed dynamically and ondemand. Dynamic hashing is also known as extended hashing. Hash function, in dynamic hashing, is made to produce a large number of values and only a few are used initially.
  • 51. Multiple-Key Access Use multiple indices for certain types of queries. Example: select ID from instructor where dept_name = “Finance” and salary = 80000 Possible strategies for processing query using indices on single attributes:
  • 52. Multiple Key Access 1. Use index on dept_name to find instructors with department name Finance; test salary = 80000 2. Use index on salary to find instructors with a salary of $80000; test dept_name = “Finance”. 3. Use dept_name index to find pointers to all records pertaining to the “Finance” department. Similarly use index on salary. Take intersection of both sets of pointers obtained.