SlideShare a Scribd company logo
Data Structure
Unit-I Part C
Hashing List Searches
Basic Concepts
 In a hashed search, the key, through an algorithmic function, determines the location of
the data.
 We use a hashing algorithm to transform the key into the index that contains the data we
need to locate.
 Another way to describe hashing is as a key-to-address transformation in which the keys
map to addresses in a list.
 Hashing is a key-to address mapping process
Data structure Unit-I Part-C
Data structure Unit-I Part-C
Data structure Unit-I Part-C
 The address produced by the hashing algorithm is known as the home address.
 We call the set of keys that hash to the same location in our list synonyms.
 A collision occurs when a hashing algorithm produces an address for an
insertion key and that address is already occupied.
 The address produced by the hashing algorithm is known as the home
address.
 The memory that contains all of the home addresses is known as the prime
area.
 Each calculation of an address and test for success is known as a probe.
Data structure Unit-I Part-C
Hashing Methods
There are eight hashing methods they are:
 Direct method
 Substraction method
 Modulo-division
 Midsquare
 Digit extraction
 Rotation
 Folding
 Pseudorandom generation
Data structure Unit-I Part-C
Direct Method:
 In direct hashing the key is the address without any algorithmic manipulation.
 Direct hashing is limited, but it can be very powerful because it guarantees
that there are no synonyms and therefore no collision.
Data structure Unit-I Part-C
Subtraction Method
 Sometimes keys are consecutive but do not start from 1.
 Example:
 A company may have only 100 employees, but the employee numbers start from
1001 and go to 1100.
 In this case we use subtraction hashing, a very simple hashing function that
subtracts 1000 from the key to determine the address.
 The direct and subtraction hash functions both guarantee a search effort of one
with no collisions.
 They are 'one-to-one hashing methods: only one key hashes to each address.
Modulo-Division/Method:
 Also known as division remainder, the modulo-division method divides the key by
the array size and uses the remainder for the address.
 This method gives us the simple hashing algorithm shown below in which listSize is
the number of elements in the array:
 Address = key MODULO listSize
 Example:
 Given data :
 Keys are : 137456 214562 140145
 137456 % 19 +1 = 11
214562 % 19 + 1 = 15
140145 % 19 + 1 = 2
Data structure Unit-I Part-C
Digit-Extraction Method:
 Using digit extraction selected digits are extracted from the key and used as the address.
 Example:
 Using our six-digit employee number to hash to a three digit address (000-999)
 We could select the first, third, and fourth digits (from the left) and use them as the
address.
 379452 -> 394
 121267 -> 112
 378845 -> 388
 160252 -> 102
 045128 -> 051
Mid Square Method
 In mid square hashing the key is squared and the address is selected from the
middle of the square number.
 Limitation is the size of the key.
 Example:
94522 = 89340304: address is 3403
 379452: 379 * 379 = 143641 -> 364
 121267: 121 * 121 = 014641 -> 464
 378845: 378 * 378 = 142884 -> 288
 160252: 160 * 160 = 025600 -> 560
 045128: 045 * 045 = 002025 -> 202
 The same digits must be selected from the product.
Folding Method
Two folding methods are used they are:
 Fold shift
 Fold boundary
Fold Shift
 In fold shift the key value is divided into parts whose size matches the size of the
required address.
 Then the left and right parts are shifted and added with the middle part.
Fold boundary
 In fold boundary the left and right numbers are folded on a fixed boundary between them
and the center number.
 The two outside values are thus reversed.
Data structure Unit-I Part-C
Rotation Method
 Rotation method is generally not used by itself but rather is incorporated in
combination with other hashing methods.
 It is most useful when keys are assigned serially.
 A simple hashing algorithm tends to create synonyms when hashing keys are
identical except for the last character.
 Rotating the last character to the front of the key minimizes this effect.
Data structure Unit-I Part-C
Pseudorandom method
 A common random-number generator is shown below.
y= ax + c
 To use the pseudorandom-number generator as a hashing method, we set x to the
key, multiply it by the coefficient a, and then add the constant c.
 The result is then divided by the list size, with the remainder being the hashed
address.
Example:
Y= ((17 * 121267) + 7) modulo 307
Y= (2061539 + 7) modulo 307
Y= 2061546
Y=41
Hashing algorithm
 The hashing methods may work well when we hash a key to an address in an array,
hashing to large files is generally more complex.
 We have an alphanumeric key consisting of up to 30 bytes that we need to hash into a
32-bit address.
 Step 1: To convert alphanumeric key into a number key by adding the American
Standard Code for Information Interchange (ASCII) value for each character to an
accumulator that will be the address.
 Step 2: As each character is added, we rotate the bits in the address to maximize the
distribution of the values.
 Step 3: After the characters in the key have been completely hashed, we take the
absolute value of the address and then map it into the address range for the file.
Data structure Unit-I Part-C
Analysis
First:
 The rotation can often be accomplished by an assembly language instruction.
 If the algorithm is written in a high-level language, then the rotation is accomplished by
a series of bitwise and instructions.
 For out purposes, it is sufficient that the 12 bits at the end of the address are shifted to
be the 12 bits at the beginning of the address and the bits at the beginning are shifted
to occupy the bit locations at the right.
Second:
 This algorithm actually uses three of the hashing methods.
 Finally, we use modulo division when we map the hashed address into the range of
available addresses.
Collision Resolution
 With the exception of the direct and subtraction methods, none of the methods
used for hashing are one-to-one mapping.
 Thus, when we hash a new key to an address, we may create a collision.
 A collision occurs when a hashing algorithm produces an address for an insertion
key and that address is already occupied.
 There are several methods for handling collisions, each of them independent of
the hashing algorithm.
Data structure Unit-I Part-C
Concepts
 The load factor of a hashed list is the number of elements in the list divided
by the number of physical elements allocated for the list, expressed as a
percentage.
 Traditionally, load factor is assigned the symbol alpha (α).
 The formula in which k repesents the number of filled elements in the list and
n represents the total number of elements allocated to the list is
 a = ( k / n ) * 100
Computer scientists have identified two distinct types of clusters.
 (i) Primary clustering occurs when data cluster around a home address.
Primary clustering is easy to identify.
 (ii) Secondary clustering occurs when data become grouped along a collision
throughout a list. This type of clustering is not easy to identify.
 There are two different approaches to resolving collisions:
 Open addressing
 Linked lists.
Open Addressing
 The first collision resolution method, open addressing, resolves collisions in the
prime area-that is, the area that contains all of the home addresses.
 When a collision occurs, the prime area addresses are searched for an 0 or
unoccupied element where the new data can be placed.
Linear Probe
 In a linear probe, which is the simplest, when data cannot be stored in the home
address we resolve the collision by adding 1 to the current address.
 However, this address is also filled.
 We therefore add another 1 to the address and this time find an empty location.
 Advantages:
 First: they are quite simple to implement.
 Second: data tend to remain near their home address.
Data structure Unit-I Part-C
Quadratic Probe
 Primary clustering, although not necessarily secondary clustering, can be
eliminated by adding a value other than 1 to the current address.
 One easily implemented method is to use the quadratic probe.
 Disadvantage:
 It is time required to square the probe number.
 We can eliminate the multiply factor, however, by using an increment factor that
increases by 2 each probe.
 Adding the increment factor to the previous increment gives us the next
increment.
 The quadratic probe has one limitation:
 It is not possible to generate a new address for every element in the list.
Data structure Unit-I Part-C
Pseudo random Collision Resolution
 The last two open addressing methods ( Linear Probe and Quadratic Probe) methods are
collectively known as double hashing.
 In each method, rather than using an arithmetic probe function, the address is rehashed.
 Pseudorandom collision resolution uses a pseudorandom number to resolve the collision.
 We now use it a collision resolution method. In this case, rather than use the key as a
factor in the random-number calculation, we use the collision address.
 We now resolve the collision using the following pseudorandom-number generator, where
a is 3 and c is 5:
 Y = (ax + c) modulo listSize
 = ( 3 * 1 + 5) Modulo 397
 = 8
Key Offset
 Double hashing method that produces different collision paths
for different keys
 Pseudorandom number generator produces a new address as a
function of previous address, key offset calculates the new
address as function of old address and key
offset = [ key/listsize]
address = ((offset + old address) modulo listSize)
Example
 When key is 166702 and list size is 307 using modulo
division hashing method generates address of 1
offset = [166702/307] = 543
address = ((543+001) modulo 307) =237
Key Offset
 If 237 were a collision, repeat the process to locate the next
address
offset = [166702/307] = 543
address = ((543+237) modulo 307) =166
Key Home
address
Key
offset
Probe 1 Probe 2
166702 1 543 237 166
572556 1 1865 024 047
067234 1 219 220 132
Linked list Collision Resolution
 Major disadvantage to open addressing is that each collision
resolution increases the probability of future collisions
 Eliminated by linked list approach
 Linked list is ordered collection of data in which element
contains the location of next element
Linked List Collision Resolution
[000]
[001]
[002]
[003]
[004]
[005]
[006]
[007]
[008]
[305]
[306]
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux
378845 Patrick Linn
160252 Tuan Ngo
045128 Feldman
166702 Harry Eagle
572556 ChrisWalljasper
Linked list Collision Resolution
 Use separate area to store collisions and chains together in
linked list
 Two storage areas: prime area and overflow area
 Each element in prime area contains additional field a link
header pointer to a linked list of overflow data in overflow
area
 When collision occurs, one element is stored in prime area and
chained to corresponding linked list in over flow area
 overflow area is typically implemented as linked list in
dynamic memory
Linked list Collision Resolution
 Linked list is stored in any order, but LIFO sequence or key
sequence
 LIFO sequence is fastest for insert because the linked list need
not be scanned to insert data
 Element being inserted into overflow is placed at beginning of
linked list and linked to node in prime area
 In key sequenced lists, key in prime area is smallest to provide
for faster search retrieval
Bucket Hashing
 Keys are hashed to bucket nodes that accommodate multiple
data occurrences
 Bucket hold multiple data, collisions are postponed until
bucket is full
Example
 Each address is large enough to hold data for three employees
 Collision will not occur until tried to add fourth employee to
address
Two problems
 Use more space because many of bucket are empty or partially
empty at any time
 It will not completely resolves collision problem
Bucket Hashing
379452 Marry Dodd
070918 Sarah Trapp
166702 Harry Eagle
367173 Ann Giorgis
121267 Byan Devaux
572556 Chris jasper
045128 Feldman
[000]
Bucket
0
[001]
Bucket
1
[002]
Bucket
2
[003]
Bucket
307
Combination Approaches
 There are several approaches to resolving collisions.
 As we saw with the hashing methods, a complex implementation often uses
multiple steps.
 Example:
 One large database implementation hashes to a bucket.
 If the bucket is full, it uses a set number of linear probes, such as three, to
resolve the collision and then uses a linked list overflow area.

More Related Content

PPTX
Demonstrate interpolation search
PPTX
Interpolation search
PPTX
Data Structures : hashing (1)
PPTX
Security using colors and armstrong numbers by sravanthi (lollypop)
PDF
F03601032037
PDF
B041306015
PPT
1.4 notes
PDF
Mc ty-logarithms-2009-1
Demonstrate interpolation search
Interpolation search
Data Structures : hashing (1)
Security using colors and armstrong numbers by sravanthi (lollypop)
F03601032037
B041306015
1.4 notes
Mc ty-logarithms-2009-1

What's hot (18)

PPTX
Digital Logic
PPT
ip addressing_&_subnetting_made_easy
PPSX
Logarithmic function, equation and inequality
PPTX
Logarithm presentation - By Your Powers Combined
PPT
Compression ii
PPT
digital logic circuits, digital component floting and fixed point
PPTX
Data structure algorithm
DOCX
Lecft3data
PPT
Lec7 8 9_10 coding techniques
PPTX
Lecture3b searching
PPTX
Digital Logic Design-Lecture 5
PPTX
Data Reprersentation
PDF
PPTX
NETWORK LAYER - Logical Addressing
PPT
Chapter 11 - Sorting and Searching
PPT
Compression Ii
PPTX
Lecture3a sorting
Digital Logic
ip addressing_&_subnetting_made_easy
Logarithmic function, equation and inequality
Logarithm presentation - By Your Powers Combined
Compression ii
digital logic circuits, digital component floting and fixed point
Data structure algorithm
Lecft3data
Lec7 8 9_10 coding techniques
Lecture3b searching
Digital Logic Design-Lecture 5
Data Reprersentation
NETWORK LAYER - Logical Addressing
Chapter 11 - Sorting and Searching
Compression Ii
Lecture3a sorting
Ad

Similar to Data structure Unit-I Part-C (20)

PPTX
Hashing 1
PPTX
Hashing .pptx
PPTX
Hashing.pptx
PPTX
Hash in datastructures by using the c language.pptx
PDF
Data Structures Design Notes.pdf
PDF
Hashing CollisionDetection in Data Structures
PPT
Ch17 Hashing
PPT
358 33 powerpoint-slides_15-hashing-collision_chapter-15
PPT
Hashing and collision for database systems
PPTX
DS Unit 1.pptx
PPTX
HASHING IS NOT YASH IT IS HASH.pptx
PPTX
Data Structures-Topic-Hashing, Collision
PPTX
Hashing.pptx......................................
PPTX
hashing explained in detail with hash functions
PDF
Tojo Sir Hash Tables.pdfsfdasdasv fdsfdfsdv
PPTX
Hashing Technique In Data Structures
PDF
Hashing and File Structures in Data Structure.pdf
PPTX
Hashing techniques, Hashing function,Collision detection techniques
PDF
Sienna 9 hashing
Hashing 1
Hashing .pptx
Hashing.pptx
Hash in datastructures by using the c language.pptx
Data Structures Design Notes.pdf
Hashing CollisionDetection in Data Structures
Ch17 Hashing
358 33 powerpoint-slides_15-hashing-collision_chapter-15
Hashing and collision for database systems
DS Unit 1.pptx
HASHING IS NOT YASH IT IS HASH.pptx
Data Structures-Topic-Hashing, Collision
Hashing.pptx......................................
hashing explained in detail with hash functions
Tojo Sir Hash Tables.pdfsfdasdasv fdsfdfsdv
Hashing Technique In Data Structures
Hashing and File Structures in Data Structure.pdf
Hashing techniques, Hashing function,Collision detection techniques
Sienna 9 hashing
Ad

More from SSN College of Engineering, Kalavakkam (20)

PDF
Localization, Classification, and Evaluation.pdf
PPTX
Database Management System - 2a
PPTX
Database Management System
PPTX
Unit III - Inventory Problems
PPTX
PPTX
PPTX
PPTX
Unit IV-Project Management
PPTX
Web technology Unit-II Part-C
PPTX
Data structure unit I part B
PPTX
Web technology Unit-II Part A
PPTX
Data structure Unit-I Part A
PPTX
Web technology Unit-I Part E

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Cell Types and Its function , kingdom of life
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
master seminar digital applications in india
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
Anesthesia in Laparoscopic Surgery in India
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
Microbial diseases, their pathogenesis and prophylaxis
Module 4: Burden of Disease Tutorial Slides S2 2025
Complications of Minimal Access Surgery at WLH
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Cell Types and Its function , kingdom of life
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Computing-Curriculum for Schools in Ghana
master seminar digital applications in india
Pharmacology of Heart Failure /Pharmacotherapy of CHF
2.FourierTransform-ShortQuestionswithAnswers.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Structure & Organelles in detailed.
102 student loan defaulters named and shamed – Is someone you know on the list?

Data structure Unit-I Part-C

  • 3. Basic Concepts  In a hashed search, the key, through an algorithmic function, determines the location of the data.  We use a hashing algorithm to transform the key into the index that contains the data we need to locate.  Another way to describe hashing is as a key-to-address transformation in which the keys map to addresses in a list.  Hashing is a key-to address mapping process
  • 7.  The address produced by the hashing algorithm is known as the home address.  We call the set of keys that hash to the same location in our list synonyms.  A collision occurs when a hashing algorithm produces an address for an insertion key and that address is already occupied.  The address produced by the hashing algorithm is known as the home address.  The memory that contains all of the home addresses is known as the prime area.  Each calculation of an address and test for success is known as a probe.
  • 9. Hashing Methods There are eight hashing methods they are:  Direct method  Substraction method  Modulo-division  Midsquare  Digit extraction  Rotation  Folding  Pseudorandom generation
  • 11. Direct Method:  In direct hashing the key is the address without any algorithmic manipulation.  Direct hashing is limited, but it can be very powerful because it guarantees that there are no synonyms and therefore no collision.
  • 13. Subtraction Method  Sometimes keys are consecutive but do not start from 1.  Example:  A company may have only 100 employees, but the employee numbers start from 1001 and go to 1100.  In this case we use subtraction hashing, a very simple hashing function that subtracts 1000 from the key to determine the address.  The direct and subtraction hash functions both guarantee a search effort of one with no collisions.  They are 'one-to-one hashing methods: only one key hashes to each address.
  • 14. Modulo-Division/Method:  Also known as division remainder, the modulo-division method divides the key by the array size and uses the remainder for the address.  This method gives us the simple hashing algorithm shown below in which listSize is the number of elements in the array:  Address = key MODULO listSize
  • 15.  Example:  Given data :  Keys are : 137456 214562 140145  137456 % 19 +1 = 11 214562 % 19 + 1 = 15 140145 % 19 + 1 = 2
  • 17. Digit-Extraction Method:  Using digit extraction selected digits are extracted from the key and used as the address.  Example:  Using our six-digit employee number to hash to a three digit address (000-999)  We could select the first, third, and fourth digits (from the left) and use them as the address.  379452 -> 394  121267 -> 112  378845 -> 388  160252 -> 102  045128 -> 051
  • 18. Mid Square Method  In mid square hashing the key is squared and the address is selected from the middle of the square number.  Limitation is the size of the key.  Example: 94522 = 89340304: address is 3403  379452: 379 * 379 = 143641 -> 364  121267: 121 * 121 = 014641 -> 464  378845: 378 * 378 = 142884 -> 288  160252: 160 * 160 = 025600 -> 560  045128: 045 * 045 = 002025 -> 202  The same digits must be selected from the product.
  • 19. Folding Method Two folding methods are used they are:  Fold shift  Fold boundary Fold Shift  In fold shift the key value is divided into parts whose size matches the size of the required address.  Then the left and right parts are shifted and added with the middle part. Fold boundary  In fold boundary the left and right numbers are folded on a fixed boundary between them and the center number.  The two outside values are thus reversed.
  • 21. Rotation Method  Rotation method is generally not used by itself but rather is incorporated in combination with other hashing methods.  It is most useful when keys are assigned serially.  A simple hashing algorithm tends to create synonyms when hashing keys are identical except for the last character.  Rotating the last character to the front of the key minimizes this effect.
  • 23. Pseudorandom method  A common random-number generator is shown below. y= ax + c  To use the pseudorandom-number generator as a hashing method, we set x to the key, multiply it by the coefficient a, and then add the constant c.  The result is then divided by the list size, with the remainder being the hashed address. Example: Y= ((17 * 121267) + 7) modulo 307 Y= (2061539 + 7) modulo 307 Y= 2061546 Y=41
  • 24. Hashing algorithm  The hashing methods may work well when we hash a key to an address in an array, hashing to large files is generally more complex.  We have an alphanumeric key consisting of up to 30 bytes that we need to hash into a 32-bit address.  Step 1: To convert alphanumeric key into a number key by adding the American Standard Code for Information Interchange (ASCII) value for each character to an accumulator that will be the address.  Step 2: As each character is added, we rotate the bits in the address to maximize the distribution of the values.  Step 3: After the characters in the key have been completely hashed, we take the absolute value of the address and then map it into the address range for the file.
  • 26. Analysis First:  The rotation can often be accomplished by an assembly language instruction.  If the algorithm is written in a high-level language, then the rotation is accomplished by a series of bitwise and instructions.  For out purposes, it is sufficient that the 12 bits at the end of the address are shifted to be the 12 bits at the beginning of the address and the bits at the beginning are shifted to occupy the bit locations at the right. Second:  This algorithm actually uses three of the hashing methods.  Finally, we use modulo division when we map the hashed address into the range of available addresses.
  • 27. Collision Resolution  With the exception of the direct and subtraction methods, none of the methods used for hashing are one-to-one mapping.  Thus, when we hash a new key to an address, we may create a collision.  A collision occurs when a hashing algorithm produces an address for an insertion key and that address is already occupied.  There are several methods for handling collisions, each of them independent of the hashing algorithm.
  • 29. Concepts  The load factor of a hashed list is the number of elements in the list divided by the number of physical elements allocated for the list, expressed as a percentage.  Traditionally, load factor is assigned the symbol alpha (α).  The formula in which k repesents the number of filled elements in the list and n represents the total number of elements allocated to the list is  a = ( k / n ) * 100
  • 30. Computer scientists have identified two distinct types of clusters.  (i) Primary clustering occurs when data cluster around a home address. Primary clustering is easy to identify.  (ii) Secondary clustering occurs when data become grouped along a collision throughout a list. This type of clustering is not easy to identify.  There are two different approaches to resolving collisions:  Open addressing  Linked lists.
  • 31. Open Addressing  The first collision resolution method, open addressing, resolves collisions in the prime area-that is, the area that contains all of the home addresses.  When a collision occurs, the prime area addresses are searched for an 0 or unoccupied element where the new data can be placed.
  • 32. Linear Probe  In a linear probe, which is the simplest, when data cannot be stored in the home address we resolve the collision by adding 1 to the current address.  However, this address is also filled.  We therefore add another 1 to the address and this time find an empty location.  Advantages:  First: they are quite simple to implement.  Second: data tend to remain near their home address.
  • 34. Quadratic Probe  Primary clustering, although not necessarily secondary clustering, can be eliminated by adding a value other than 1 to the current address.  One easily implemented method is to use the quadratic probe.  Disadvantage:  It is time required to square the probe number.  We can eliminate the multiply factor, however, by using an increment factor that increases by 2 each probe.  Adding the increment factor to the previous increment gives us the next increment.  The quadratic probe has one limitation:  It is not possible to generate a new address for every element in the list.
  • 36. Pseudo random Collision Resolution  The last two open addressing methods ( Linear Probe and Quadratic Probe) methods are collectively known as double hashing.  In each method, rather than using an arithmetic probe function, the address is rehashed.  Pseudorandom collision resolution uses a pseudorandom number to resolve the collision.  We now use it a collision resolution method. In this case, rather than use the key as a factor in the random-number calculation, we use the collision address.  We now resolve the collision using the following pseudorandom-number generator, where a is 3 and c is 5:  Y = (ax + c) modulo listSize  = ( 3 * 1 + 5) Modulo 397  = 8
  • 37. Key Offset  Double hashing method that produces different collision paths for different keys  Pseudorandom number generator produces a new address as a function of previous address, key offset calculates the new address as function of old address and key offset = [ key/listsize] address = ((offset + old address) modulo listSize) Example  When key is 166702 and list size is 307 using modulo division hashing method generates address of 1 offset = [166702/307] = 543 address = ((543+001) modulo 307) =237
  • 38. Key Offset  If 237 were a collision, repeat the process to locate the next address offset = [166702/307] = 543 address = ((543+237) modulo 307) =166 Key Home address Key offset Probe 1 Probe 2 166702 1 543 237 166 572556 1 1865 024 047 067234 1 219 220 132
  • 39. Linked list Collision Resolution  Major disadvantage to open addressing is that each collision resolution increases the probability of future collisions  Eliminated by linked list approach  Linked list is ordered collection of data in which element contains the location of next element
  • 40. Linked List Collision Resolution [000] [001] [002] [003] [004] [005] [006] [007] [008] [305] [306] 379452 Marry Dodd 070918 Sarah Trapp 121267 Bryan Devaux 378845 Patrick Linn 160252 Tuan Ngo 045128 Feldman 166702 Harry Eagle 572556 ChrisWalljasper
  • 41. Linked list Collision Resolution  Use separate area to store collisions and chains together in linked list  Two storage areas: prime area and overflow area  Each element in prime area contains additional field a link header pointer to a linked list of overflow data in overflow area  When collision occurs, one element is stored in prime area and chained to corresponding linked list in over flow area  overflow area is typically implemented as linked list in dynamic memory
  • 42. Linked list Collision Resolution  Linked list is stored in any order, but LIFO sequence or key sequence  LIFO sequence is fastest for insert because the linked list need not be scanned to insert data  Element being inserted into overflow is placed at beginning of linked list and linked to node in prime area  In key sequenced lists, key in prime area is smallest to provide for faster search retrieval
  • 43. Bucket Hashing  Keys are hashed to bucket nodes that accommodate multiple data occurrences  Bucket hold multiple data, collisions are postponed until bucket is full Example  Each address is large enough to hold data for three employees  Collision will not occur until tried to add fourth employee to address Two problems  Use more space because many of bucket are empty or partially empty at any time  It will not completely resolves collision problem
  • 44. Bucket Hashing 379452 Marry Dodd 070918 Sarah Trapp 166702 Harry Eagle 367173 Ann Giorgis 121267 Byan Devaux 572556 Chris jasper 045128 Feldman [000] Bucket 0 [001] Bucket 1 [002] Bucket 2 [003] Bucket 307
  • 45. Combination Approaches  There are several approaches to resolving collisions.  As we saw with the hashing methods, a complex implementation often uses multiple steps.  Example:  One large database implementation hashes to a bucket.  If the bucket is full, it uses a set number of linear probes, such as three, to resolve the collision and then uses a linked list overflow area.