SlideShare a Scribd company logo
Hashing Techniques
• Hashing in DBMS is a technique to quickly locate a data record in a
database irrespective of the size of the database. For larger databases
containing thousands and millions of records, the indexing data
structure technique becomes very inefficient because searching a
specific record through indexing will consume more time.
What is Hashing?
• The hashing technique utilizes an auxiliary hash table to store the data records
using a hash function. There are 2 key components in hashing:
• Hash Table: A hash table is an array or data structure and its size is determined by
the total volume of data records present in the database. Each memory location in
a hash table is called a ‘bucket‘ or hash indices and stores a data record’s exact
location and can be accessed through a hash function.
• Bucket: A bucket is a memory location (index) in the hash table that stores the
data record. These buckets generally store a disk block which further stores
multiple records. It is also known as the hash index.
• Hash Function: A hash function is a mathematical equation or algorithm that takes
one data record’s primary key as input and computes the hash index as output.
Hashing Techniques in database management systems
Hash Function
• A hash function is a mathematical algorithm that computes the index
or the location where the current data record is to be stored in the
hash table so that it can be accessed efficiently later. This hash
function is the most crucial component that determines the speed of
fetching data.
Hashing Techniques in database management systems
Internal Hashing
• For internal files, hashing is typically implemented as a hash table
through the use of an array of records. Suppose that the array index
range is from 0 to M – 1
• we have M slots whose addresses correspond to the array indexes.
We choose a hash function that transforms the hash field value into
an integer between 0 and M − 1. One common hash function is the
h(K) = K mod M
• M function, which returns the remainder of an integer hash field
value K after division by M; this value is then used for the record
address.
Hashing Techniques in database management systems
Two simple hashing algorithms: (a) Applying the mod hash
• function to a character string K. (b) Collision resolution by open addressing.
• (a) temp ← 1;
for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ;
hash_address ← temp mod M;
• (b) i ← hash_address(K); a ← i;
if location i is occupied
then begin i ← (i + 1) mod M;
while (i ≠ a) and location i is occupied
do i ← (i + 1) mod M;
if (i = a) then all positions are full
else new_hash_address ← i;
end;
• Other hashing functions can be used. One technique, called
folding, involves applying an arithmetic function such as
addition or a logical function such as exclusive or to different
portions of the hash field value to calculate the hash address
• for example, with an address space from 0 to 999 to store
1,000 keys, a 6-digit key 235469 may be folded and stored at
the address: (235+964) mod 1000 = 199
• Another technique involves picking some digits of the hash
field value—for instance, the third, fifth, and eighth digits—to
form the hash address (for example, storing 1,000 employees
with Social Security numbers of 10 digits into a hash file with
1,000 positions would give the Social Security number 301-67-
8923 a hash value of 172 by this hash function).
• A collision occurs when the hash field value of a record that
is being inserted hashes to an address that already contains
a different record. In this situation, we must insert the new
record in some other position, since its hash address is
occupied. The process of finding another position is called
collision resolution. There are numerous methods for
collision resolution, including the following:
• 1.CHAINING
• 2.OPEN ADDRESSING
• 3.MULTIPLE HASHING
Chaining
For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions.
Additionally, a pointer field is added to each record location. A
collision is resolved by placing the new record in an unused overflow
location and setting the pointer of the occupied hash address
location to the address of that overflow location
Open Addressing/Closed Hashing
This is also called closed hashing this aims to solve the problem of collision by
looking out for the next empty slot available which can store data. It uses
techniques like linear probing, quadratic probing, double hashing, etc.
Multiple hashing.
The program applies a second hash function if the first results in a
collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open
addressing if necessary.
External hashing
• Hashing for disk files is called external hashing. To suit the characteristics of disk
storage, the hash address space is made of buckets. Each bucket consists of
either one disk block or a cluster of contiguous (neighbouring) blocks, and can
accommodate a certain number of records.
• A hash function maps a key into a relative bucket number, rather than assigning
an absolute block address to the bucket. A table maintained in the file header
converts the relative bucket number into the corresponding disk block address.
• The collision problem is less severe with buckets, because as many records as
will fit in a bucket can hash to the same bucket without causing any problem. If
the collision problem does occur when a bucket is filled to its capacity, we can
use a variation of the chaining method to resolve it.
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Hashing Techniques in database management systems
Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle
database that frequently changes data sets. This method offers us a
way to add and remove data buckets on demand dynamically. This way
as the number of data records varies, the buckets will also grow and
shrink in size periodically whenever a change is made.
• Properties of Dynamic Hashing
• The buckets will vary in size dynamically periodically as changes are made
offering more flexibility in making any change.
• Dynamic Hashing aids in improving overall performance by minimizing or
completely preventing collisions.
• It has the following major components: Data bucket, Flexible hash
function, and directories
• A flexible hash function means that it will generate more dynamic values
and will keep changing periodically asserting to the requirements of the
database.
• Directories are containers that store the pointer to buckets. If bucket
overflow or bucket skew-like problems happen to occur, then bucket
splitting is done to maintain efficient retrieval time of data records. Each
directory will have a directory id.
suppose that a new record to be inserted causes overflow in the bucket whose hash values start with 01 (the third bucket). The records in that
bucket will have to be redistributed among two buckets: the first contains all records whose hash values start with 010, and the second contains
all those whose hash values start with 011. Now the two directory entries for 010 and 011 point to the two new distinct buckets. Before the split,
they point to the same bucket. The local depth of the two new buckets is 3, which is one more than the local depth of the old bucket.
If global depth: k = 2, the keys will be mapped accordingly to the hash index. K bits
starting from LSB will be taken to map a key to the buckets. That leaves us with the
following 4 possibilities: 00, 11, 10, 01.
Retrieval - To find the bucket containing the search key value K:
Compute h(K).
Take the first i bits of h(K).Look at the corresponding table entry for this
i-bit string.Follow the bucket pointer in the table entry to retrieve the
block.
Insertion - To add a new record with the hash key value K:
Follow the same procedure for retrieval, ending up in some bucket.
If there is still space in that bucket, place the record in it.If the bucket is
full, we must split the bucket and redistribute the records.If a bucket is
split, we may need to increase the number of bits we use in the hash.
Performance issues
• Hashing provides the fastest possible access for retrieving a record based on its hash field value.
However, search for a record where the hash field value is not available is as expensive as in the case
of a heap file.
• Record deletion can be implemented by removing the record from its bucket. If the bucket has an
overflow chain, we can move one of the overflow records into the bucket to replace the deleted
record. If the record to be deleted is already in the overflow, we simply remove it from the linked list.
• To insert a new record, first, we use the hash function to find the address of the bucket the record
should be in. Then, we insert the record into an available location in the bucket. If the bucket is full,
we will place the record in one of the locations for overflow records.
• The performance of a modification operation depends on two factors: first, the search condition to
locate the record, and second, the field to be modified.
• If the search condition is an equality comparison on the hash field, we can locate the record
efficiently by using the hash function. Otherwise, we must perform a linear search.
• A non-hash field value can be changed and the modified record can be rewritten back to its original
bucket.
• Modifying the hash field value means that the record may move to another bucket, which requires
the deletion of the old record followed by the insertion of the modified one as a new record.

More Related Content

PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
PPTX
Hashing_UNIT2.pptx
PPT
Hashing gt1
PPTX
Data base Hash based indexing good.pptxx
PPTX
hashing in data structures and its applications
PPTX
UNIT-V-hashing and its techniques ppt.pptx
PDF
Hashing and File Structures in Data Structure.pdf
PDF
DataBaseManagementSystems-BTECH--UNIT-5.pdf
Hashing_UNIT2.pptx
Hashing gt1
Data base Hash based indexing good.pptxx
hashing in data structures and its applications
UNIT-V-hashing and its techniques ppt.pptx
Hashing and File Structures in Data Structure.pdf

Similar to Hashing Techniques in database management systems (20)

PPTX
Unit4 Part3.pptx
PPTX
Bca ii dfs u-4 sorting and searching structure
PPTX
Bsc cs ii dfs u-4 sorting and searching structure
PPTX
Mca ii dfs u-5 sorting and searching structure
PPTX
PDF
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
PPTX
Hashing techniques, Hashing function,Collision detection techniques
PPTX
Data Structures-Topic-Hashing, Collision
PPTX
Hashing and Collision Advanced data structure and algorithm
PDF
Hash Tables in data Structure
PPTX
23_Hashing Techniques.pptx sndueiejbeueiebsh
PPTX
Hashing .pptx
PPTX
Hashing.pptx
PPT
Hashing in DS
PPTX
Hashing
PPT
PPTX
PDF
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
PPTX
Hashing And Hashing Tables
Unit4 Part3.pptx
Bca ii dfs u-4 sorting and searching structure
Bsc cs ii dfs u-4 sorting and searching structure
Mca ii dfs u-5 sorting and searching structure
Hashing notes data structures (HASHING AND HASH FUNCTIONS)
Hashing techniques, Hashing function,Collision detection techniques
Data Structures-Topic-Hashing, Collision
Hashing and Collision Advanced data structure and algorithm
Hash Tables in data Structure
23_Hashing Techniques.pptx sndueiejbeueiebsh
Hashing .pptx
Hashing.pptx
Hashing in DS
Hashing
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
Hashing And Hashing Tables
Ad

More from SheebaS25 (10)

PPT
NORMALIZATION in database management systems
PPTX
Class 15_Introduction to Relational Algebra and Relational Calculus_19.11.202...
PPT
MAP REDUCE PROGRAMMING-big data analyticsata
PPTX
CLOUD COMPUTING-MESSAGE PASSING INTERFACE
PPTX
Data Base Management system-ENTITY RELATIONSHIP MODEL
PPTX
ANEKA in cloud computing platform distributed applications
PPTX
Transaction characteristics in SQL-DataBase Management system
PPTX
transaction management-database management system
PPTX
Dangling Tuples-Database management system
PPT
chapter10-data communication and networking
NORMALIZATION in database management systems
Class 15_Introduction to Relational Algebra and Relational Calculus_19.11.202...
MAP REDUCE PROGRAMMING-big data analyticsata
CLOUD COMPUTING-MESSAGE PASSING INTERFACE
Data Base Management system-ENTITY RELATIONSHIP MODEL
ANEKA in cloud computing platform distributed applications
Transaction characteristics in SQL-DataBase Management system
transaction management-database management system
Dangling Tuples-Database management system
chapter10-data communication and networking
Ad

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Digital Logic Computer Design lecture notes
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
Well-logging-methods_new................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Foundation to blockchain - A guide to Blockchain Tech
CYBER-CRIMES AND SECURITY A guide to understanding
bas. eng. economics group 4 presentation 1.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Mechanical Engineering MATERIALS Selection
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
R24 SURVEYING LAB MANUAL for civil enggi
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Digital Logic Computer Design lecture notes
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Sustainable Sites - Green Building Construction
Well-logging-methods_new................
Model Code of Practice - Construction Work - 21102022 .pdf
additive manufacturing of ss316l using mig welding
UNIT 4 Total Quality Management .pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd

Hashing Techniques in database management systems

  • 2. • Hashing in DBMS is a technique to quickly locate a data record in a database irrespective of the size of the database. For larger databases containing thousands and millions of records, the indexing data structure technique becomes very inefficient because searching a specific record through indexing will consume more time.
  • 3. What is Hashing? • The hashing technique utilizes an auxiliary hash table to store the data records using a hash function. There are 2 key components in hashing: • Hash Table: A hash table is an array or data structure and its size is determined by the total volume of data records present in the database. Each memory location in a hash table is called a ‘bucket‘ or hash indices and stores a data record’s exact location and can be accessed through a hash function. • Bucket: A bucket is a memory location (index) in the hash table that stores the data record. These buckets generally store a disk block which further stores multiple records. It is also known as the hash index. • Hash Function: A hash function is a mathematical equation or algorithm that takes one data record’s primary key as input and computes the hash index as output.
  • 5. Hash Function • A hash function is a mathematical algorithm that computes the index or the location where the current data record is to be stored in the hash table so that it can be accessed efficiently later. This hash function is the most crucial component that determines the speed of fetching data.
  • 7. Internal Hashing • For internal files, hashing is typically implemented as a hash table through the use of an array of records. Suppose that the array index range is from 0 to M – 1 • we have M slots whose addresses correspond to the array indexes. We choose a hash function that transforms the hash field value into an integer between 0 and M − 1. One common hash function is the h(K) = K mod M • M function, which returns the remainder of an integer hash field value K after division by M; this value is then used for the record address.
  • 9. Two simple hashing algorithms: (a) Applying the mod hash • function to a character string K. (b) Collision resolution by open addressing. • (a) temp ← 1; for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ; hash_address ← temp mod M; • (b) i ← hash_address(K); a ← i; if location i is occupied then begin i ← (i + 1) mod M; while (i ≠ a) and location i is occupied do i ← (i + 1) mod M; if (i = a) then all positions are full else new_hash_address ← i; end;
  • 10. • Other hashing functions can be used. One technique, called folding, involves applying an arithmetic function such as addition or a logical function such as exclusive or to different portions of the hash field value to calculate the hash address • for example, with an address space from 0 to 999 to store 1,000 keys, a 6-digit key 235469 may be folded and stored at the address: (235+964) mod 1000 = 199 • Another technique involves picking some digits of the hash field value—for instance, the third, fifth, and eighth digits—to form the hash address (for example, storing 1,000 employees with Social Security numbers of 10 digits into a hash file with 1,000 positions would give the Social Security number 301-67- 8923 a hash value of 172 by this hash function).
  • 11. • A collision occurs when the hash field value of a record that is being inserted hashes to an address that already contains a different record. In this situation, we must insert the new record in some other position, since its hash address is occupied. The process of finding another position is called collision resolution. There are numerous methods for collision resolution, including the following: • 1.CHAINING • 2.OPEN ADDRESSING • 3.MULTIPLE HASHING
  • 12. Chaining For this method, various overflow locations are kept, usually by extending the array with a number of overflow positions. Additionally, a pointer field is added to each record location. A collision is resolved by placing the new record in an unused overflow location and setting the pointer of the occupied hash address location to the address of that overflow location Open Addressing/Closed Hashing This is also called closed hashing this aims to solve the problem of collision by looking out for the next empty slot available which can store data. It uses techniques like linear probing, quadratic probing, double hashing, etc. Multiple hashing. The program applies a second hash function if the first results in a collision. If another collision results, the program uses open addressing or applies a third hash function and then uses open addressing if necessary.
  • 13. External hashing • Hashing for disk files is called external hashing. To suit the characteristics of disk storage, the hash address space is made of buckets. Each bucket consists of either one disk block or a cluster of contiguous (neighbouring) blocks, and can accommodate a certain number of records. • A hash function maps a key into a relative bucket number, rather than assigning an absolute block address to the bucket. A table maintained in the file header converts the relative bucket number into the corresponding disk block address. • The collision problem is less severe with buckets, because as many records as will fit in a bucket can hash to the same bucket without causing any problem. If the collision problem does occur when a bucket is filled to its capacity, we can use a variation of the chaining method to resolve it.
  • 32. Dynamic Hashing Dynamic hashing is also known as extendible hashing, used to handle database that frequently changes data sets. This method offers us a way to add and remove data buckets on demand dynamically. This way as the number of data records varies, the buckets will also grow and shrink in size periodically whenever a change is made.
  • 33. • Properties of Dynamic Hashing • The buckets will vary in size dynamically periodically as changes are made offering more flexibility in making any change. • Dynamic Hashing aids in improving overall performance by minimizing or completely preventing collisions. • It has the following major components: Data bucket, Flexible hash function, and directories • A flexible hash function means that it will generate more dynamic values and will keep changing periodically asserting to the requirements of the database. • Directories are containers that store the pointer to buckets. If bucket overflow or bucket skew-like problems happen to occur, then bucket splitting is done to maintain efficient retrieval time of data records. Each directory will have a directory id.
  • 34. suppose that a new record to be inserted causes overflow in the bucket whose hash values start with 01 (the third bucket). The records in that bucket will have to be redistributed among two buckets: the first contains all records whose hash values start with 010, and the second contains all those whose hash values start with 011. Now the two directory entries for 010 and 011 point to the two new distinct buckets. Before the split, they point to the same bucket. The local depth of the two new buckets is 3, which is one more than the local depth of the old bucket.
  • 35. If global depth: k = 2, the keys will be mapped accordingly to the hash index. K bits starting from LSB will be taken to map a key to the buckets. That leaves us with the following 4 possibilities: 00, 11, 10, 01.
  • 36. Retrieval - To find the bucket containing the search key value K: Compute h(K). Take the first i bits of h(K).Look at the corresponding table entry for this i-bit string.Follow the bucket pointer in the table entry to retrieve the block. Insertion - To add a new record with the hash key value K: Follow the same procedure for retrieval, ending up in some bucket. If there is still space in that bucket, place the record in it.If the bucket is full, we must split the bucket and redistribute the records.If a bucket is split, we may need to increase the number of bits we use in the hash.
  • 37. Performance issues • Hashing provides the fastest possible access for retrieving a record based on its hash field value. However, search for a record where the hash field value is not available is as expensive as in the case of a heap file. • Record deletion can be implemented by removing the record from its bucket. If the bucket has an overflow chain, we can move one of the overflow records into the bucket to replace the deleted record. If the record to be deleted is already in the overflow, we simply remove it from the linked list. • To insert a new record, first, we use the hash function to find the address of the bucket the record should be in. Then, we insert the record into an available location in the bucket. If the bucket is full, we will place the record in one of the locations for overflow records. • The performance of a modification operation depends on two factors: first, the search condition to locate the record, and second, the field to be modified. • If the search condition is an equality comparison on the hash field, we can locate the record efficiently by using the hash function. Otherwise, we must perform a linear search. • A non-hash field value can be changed and the modified record can be rewritten back to its original bucket. • Modifying the hash field value means that the record may move to another bucket, which requires the deletion of the old record followed by the insertion of the modified one as a new record.

Editor's Notes

  • #1: https://www.cs.colostate.edu/~cs430dl/pages/more_examples/Ch11/Extendible%20Hashing.pdf https://www.studocu.com/in/document/maulana-abul-kalam-azad-university-of-technology/database-management-systems/fundamentals-of-database-systems-chapter-16/48558166 https://www.brainkart.com/article/Parallelizing-Disk-Access-Using-RAID-Technology_11521/ https://www.cs.uct.ac.za/mit_notes/database/htmls/chp11.html#hashing-techniques https://www.studocu.com/in/document/osmania-university/bachelor-of-commerce-computers/what-is-hashing-in-dbms-database-management-system/41129075
  • #2: Hashing is a process of scrambling a piece of information or data beyond recognition. We pass the input through a hash function to calculate the hash value
  • #10: The problem with most hashing functions is that they do not guarantee that distinct values will hash to distinct addresses, because the hash field space—the number of possible values a hash field can take—is usually much larger than the address space—the number of available addresses for records. The hash- ing function maps the hash field space to the address space
  • #13: In this situation, we maintain a pointer in each bucket to a linked list of overflow records for the bucket. The pointers in the linked list should be record pointers, which include both a block address and a relative record position within the block.
  • #14: A bucket is either one disk block or cluster of contiguous disk blocks. The collision problem is less severe with buckets, because as many records as will fit in a bucket can hash to the same bucket without causing problems. However, we must make provisions for the case where a bucket is filled to capacity and a new record being inserted hashes to that bucket.
  • #34: The access structure is built on binary representation of hashing function which is string of bits.