Data Structures Using C++ 2E
Chapter 9
Searching and Hashing Algorithms
Data Structures Using C++ 2E 2
Objectives
• Learn the various search algorithms
• Explore how to implement the sequential and binary
search algorithms
• Discover how the sequential and binary search
algorithms perform
• Become aware of the lower bound on comparison-
based search algorithms
• Learn about hashing
Data Structures Using C++ 2E 3
Search Algorithms
• Item key
– Unique member of the item
– Used in searching, sorting, insertion, deletion
• Number of key comparisons
– Comparing the key of the search item with the key of
an item in the list
• Can use class arrayListType (Chapter 3)
– Implements a list and basic operations in an array
Data Structures Using C++ 2E 4
Sequential Search
• Array-based lists
– Covered in Chapter 3
• Linked lists
– Covered in Chapter 5
• Works the same for array-based lists and linked lists
• See code on page 499
Data Structures Using C++ 2E 5
Sequential Search Analysis
• Examine effect of for loop in code on page 499
• Different programmers might implement same
algorithm differently
• Computer speed affects performance
Data Structures Using C++ 2E 6
Sequential Search Analysis (cont’d.)
• Sequential search algorithm performance
– Examine worst case and average case
– Count number of key comparisons
• Unsuccessful search
– Search item not in list
– Make n comparisons
• Conducting algorithm performance analysis
– Best case: make one key comparison
– Worst case: algorithm makes n comparisons
Data Structures Using C++ 2E 7
Sequential Search Analysis (cont’d.)
• Determining the average number of comparisons
– Consider all possible cases
– Find number of comparisons for each case
– Add number of comparisons, divide by number of
cases
Data Structures Using C++ 2E 8
Sequential Search Analysis (cont’d.)
• Determining the average number of comparisons
(cont’d.)
Data Structures Using C++ 2E 9
Ordered Lists
• Elements ordered according to some criteria
– Usually ascending order
• Operations
– Same as those on an unordered list
• Determining if list is empty or full, determining list
length, printing the list, clearing the list
• Defining ordered list as an abstract data type (ADT)
– Use inheritance to derive the class to implement the
ordered lists from class arrayListType
– Define two classes
Data Structures Using C++ 2E 10
Ordered Lists (cont’d.)
Data Structures Using C++ 2E 11
Binary Search
• Performed only on ordered lists
• Uses divide-and-conquer technique
FIGURE 9-1 List of length 12
FIGURE 9-2 Search list, list[0]...list[11]
FIGURE 9-3 Search list, list[6]...list[11]
Data Structures Using C++ 2E 12
Binary Search (cont’d.)
• C++ function implementing binary search algorithm
Data Structures Using C++ 2E 13
Binary Search (cont’d.)
• Example 9-1
FIGURE 9-4 Sorted list for a binary search
TABLE 9-1 Values of first, last, and mid and the
number of comparisons for search item 89
Data Structures Using C++ 2E 14
Binary Search (cont’d.)
TABLE 9-2 Values of first, last, and mid and the
number of comparisons for search item 34
TABLE 9-3 Values of first, last, and mid and the
number of comparisons for search item 22
Data Structures Using C++ 2E 15
Insertion into an Ordered List
• After insertion: resulting list must be ordered
– Find place in the list to insert item
• Use algorithm similar to binary search algorithm
– Slide list elements one array position down to make
room for the item to be inserted
– Insert the item
• Use function insertAt (class arrayListType)
Data Structures Using C++ 2E 16
Insertion into an Ordered List (cont’d.)
• Algorithm to insert the item
• Function insertOrd implements algorithm
Data Structures Using C++ 2E 17
Data Structures Using C++ 2E 18
Insertion into an Ordered List (cont’d.)
• Add binary search algorithm and the insertOrd
algorithm to the class orderedArrayListType
Data Structures Using C++ 2E 19
Insertion into an Ordered List (cont’d.)
• class orderedArrayListType
– Derived from class arrayListType
– List elements of orderedArrayListType
• Ordered
• Must override functions insertAt and insertEnd
of class arrayListType in class
orderedArrayListType
– If these functions are used by an object of type
orderedArrayListType, list elements will remain
in order
Data Structures Using C++ 2E 20
Insertion into an Ordered List (cont’d.)
• Can also override function seqSearch
– Perform sequential search on an ordered list
• Takes into account that elements are ordered
TABLE 9-4 Number of comparisons for a list of length n
Data Structures Using C++ 2E 21
Lower Bound on Comparison-Based
Search Algorithms
• Comparison-based search algorithms
– Search list by comparing target element with list
elements
• Sequential search: order n
• Binary search: order log2n
Data Structures Using C++ 2E 22
Lower Bound on Comparison-Based
Search Algorithms (cont’d.)
• Devising a search algorithm with order less than
log2n
– Obtain lower bound on number of comparisons
• Cannot be comparison based
Data Structures Using C++ 2E 23
Hashing
• Algorithm of order one (on average)
• Requires data to be specially organized
– Hash table
• Helps organize data
• Stored in an array
• Denoted by HT
– Hash function
• Arithmetic function denoted by h
• Applied to key X
• Compute h(X): read as h of X
• h(X) gives address of the item
Data Structures Using C++ 2E 24
Hashing (cont’d.)
• Organizing data in the hash table
– Store data within the hash table (array)
– Store data in linked lists
• Hash table HT divided into b buckets
– HT[0], HT[1], . . ., HT[b – 1]
– Each bucket capable of holding r items
– Follows that br = m, where m is the size of HT
– Generally r = 1
• Each bucket can hold one item
• The hash function h maps key X onto an integer t
– h(X) = t, such that 0 <= h(X) <= b – 1
Data Structures Using C++ 2E 25
Hashing (cont’d.)
• See Examples 9-2 and 9-3
• Synonym
– Occurs if h(X1) = h(X2)
• Given two keys X1 and X2, such that X1 ≠ X2
• Overflow
– Occurs if bucket t full
• Collision
– Occurs if h(X1) = h(X2)
• Given X1 and X2 nonidentical keys
Data Structures Using C++ 2E 26
Hashing (cont’d.)
• Overflow and collision occur at same time
– If r = 1 (bucket size = one)
• Choosing a hash function
– Main objectives
• Choose an easy to compute hash function
• Minimize number of collisions
• If HTSize denotes the size of hash table (array size
holding the hash table)
– Assume bucket size = one
• Each bucket can hold one item
• Overflow and collision occur simultaneously
Data Structures Using C++ 2E 27
Hash Functions: Some Examples
• Mid-square
• Folding
• Division (modular arithmetic)
– In C++
• h(X) = iX % HTSize;
– C++ function
Data Structures Using C++ 2E 28
Collision Resolution
• Desirable to minimize number of collisions
– Collisions unavoidable in reality
• Hash function always maps a larger domain onto a
smaller range
• Collision resolution technique categories
– Open addressing (closed hashing)
• Data stored within the hash table
– Chaining (open hashing)
• Data organized in linked lists
• Hash table: array of pointers to the linked lists
Data Structures Using C++ 2E 29
Collision Resolution: Open Addressing
• Data stored within the hash table
– For each key X, h(X) gives index in the array
• Where item with key X likely to be stored
Data Structures Using C++ 2E 30
Linear Probing
• Starting at location t
– Search array sequentially to find next available slot
• Assume circular array
– If lower portion of array full
• Can continue search in top portion of array using mod
operator
– Starting at t, check array locations using probe
sequence
• t, (t + 1) % HTSize, (t + 2) % HTSize, . . ., (t + j) %
HTSize
Data Structures Using C++ 2E 31
Linear Probing (cont’d.)
• The next array slot is given by
– (h(X) + j) % HTSize where j is the jth probe
• See Example 9-4
• C++ code implementing linear programming
Data Structures Using C++ 2E 32
Linear Probing (cont’d.)
• Causes clustering
– More and more new keys would likely be hashed to
the array slots already occupied
FIGURE 9-7 Hash table of size 20 with certain positions occupied
FIGURE 9-6 Hash table of size 20 with certain positions occupied
FIGURE 9-5 Hash table of size 20
Data Structures Using C++ 2E 33
Linear Probing (cont’d.)
• Improving linear probing
– Skip array positions by fixed constant (c) instead of
one
– New hash address:
• If c = 2 and h(X) = 2k (h(X) even)
– Only even-numbered array positions visited
• If c = 2 and h(X) = 2k + 1, ( h(X) odd)
– Only odd-numbered array positions visited
• To visit all the array positions
– Constant c must be relatively prime to HTSize
Data Structures Using C++ 2E 34
Random Probing
• Uses random number generator to find next
available slot
– ith slot in probe sequence: (h(X) + ri) % HTSize
• Where ri is the ith value in a random permutation of the
numbers 1 to HTSize – 1
– All insertions, searches use same random numbers
sequence
• See Example 9-5
Data Structures Using C++ 2E 35
Rehashing
• If collision occurs with hash function h
– Use a series of hash functions: h1, h2, . . ., hs
– If collision occurs at h(X)
• Array slots hi(X), 1 <= hi(X) <= s examined
Data Structures Using C++ 2E 36
Quadratic Probing
• Suppose
– Item with key X hashed at t (h(X) = t and 0 <= t <=
HTSize – 1)
– Position t already occupied
• Starting at position t
– Linearly search array at locations (t + 1)% HTSize, (t
+ 22 ) % HTSize = (t + 4) %HTSize, (t + 32) % HTSize
= (t + 9) % HTSize, . . ., (t + i2) % HTSize
• Probe sequence: t, (t + 1) % HTSize (t + 22 ) %
HTSize, (t + 32) % HTSize, . . ., (t + i2) % HTSize
Data Structures Using C++ 2E 37
Quadratic Probing (cont’d.)
• See Example 9-6
• Reduces primary clustering
• Does not probe all positions in the table
– Probes about half the table before repeating probe
sequence
• When HTSize is a prime
– Considerable number of probes
• Assume full table
• Stop insertion (and search)
Data Structures Using C++ 2E 38
Quadratic Probing (cont’d.)
• Generating the probe sequence
Data Structures Using C++ 2E 39
Quadratic Probing (cont’d.)
• Consider probe sequence
– t, t +1, t + 22, t + 32, . . . , (t + i2) % HTSize
– C++ code computes ith probe
• (t + i2) % HTSize
Data Structures Using C++ 2E 40
Quadratic Probing (cont’d.)
• Pseudocode implementing quadratic probing
Data Structures Using C++ 2E 41
Quadratic Probing (cont’d.)
• Random, quadratic probings eliminate primary
clustering
• Secondary clustering
– Random, quadratic probing functions of home
positions
• Not original key
Data Structures Using C++ 2E 42
Quadratic Probing (cont’d.)
• Secondary clustering (cont’d.)
– If two nonidentical keys (X1 and X2) hashed to same
home position (h(X1) = h(X2))
• Same probe sequence followed for both keys
– If hash function causes a cluster at a particular home
position
• Cluster remains under these probings
Data Structures Using C++ 2E 43
Quadratic Probing (cont’d.)
• Solve secondary clustering with double hashing
– Use linear probing
• Increment value: function of key
– If collision occurs at h(X)
• Probe sequence generation
• See Examples 9-7 and 9-8
Data Structures Using C++ 2E 44
Deletion: Open Addressing
• Designing a class as an ADT
– Implement hashing using quadratic probing
• Use two arrays
– One stores the data
– One uses indexStatusList as described in the
previous section
• Indicates whether a position in hash table free,
occupied, used previously
• See code on pages 521 and 522
– Class template implementing hashing as an ADT
– Definition of function insert
Data Structures Using C++ 2E 45
Collision Resolution: Chaining (Open
Hashing)
• Hash table HT: array of pointers
– For each j, where 0 <= j <= HTsize -1
• HT[j] is a pointer to a linked list
• Hash table size (HTSize): less than or equal to the
number of items
FIGURE 9-10 Linked hash table
Data Structures Using C++ 2E 46
Collision Resolution: Chaining (cont’d.)
• Item insertion and collision
– For each key X (in the item)
• First find h(X) – t, where 0 <= t <= HTSize – 1
• Item with this key inserted in linked list pointed to by
HT[t]
– For nonidentical keys X1 and X2
• If h(X1) = h(X2)
– Items with keys X1 and X2 inserted in same linked list
• Collision handled quickly, effectively
Data Structures Using C++ 2E 47
Collision Resolution: Chaining (cont’d.)
• Search
– Determine whether item R with key X is in the hash
table
• First calculate h(X)
– Example: h(X) = T
• Linked list pointed to by HT[t] searched sequentially
• Deletion
– Delete item R from the hash table
• Search hash table to find where in a linked list R exists
• Adjust pointers at appropriate locations
• Deallocate memory occupied by R
Data Structures Using C++ 2E 48
Collision Resolution: Chaining (cont’d.)
• Overflow
– No longer a concern
• Data stored in linked lists
• Memory space to store data allocated dynamically
– Hash table size
• No longer needs to be greater than number of items
– Hash table less than the number of items
• Some linked lists contain more than one item
• Good hash function has average linked list length still
small (search is efficient)
Data Structures Using C++ 2E 49
Collision Resolution: Chaining (cont’d.)
• Advantages of chaining
– Item insertion and deletion: straightforward
– Efficient hash function
• Few keys hashed to same home position
• Short linked list (on average)
– Shorter search length
• If item size is large
– Saves a considerable amount of space
Data Structures Using C++ 2E 50
Collision Resolution: Chaining (cont’d.)
• Disadvantage of chaining
– Small item size wastes space
• Example: 1000 items each requires one word of
storage
– Chaining
• Requires 3000 words of storage
– Quadratic probing
• If hash table size twice number of items: 2000 words
• If table size three times number of items
– Keys reasonably spread out
– Results in fewer collisions
Data Structures Using C++ 2E 51
Hashing Analysis
• Load factor
– Parameter α
TABLE 9-5 Number of comparisons in hashing
Data Structures Using C++ 2E 52
Summary
• Sequential search
– Order n
• Ordered lists
– Elements ordered according to some criteria
• Binary search
– Order log2n
• Hashing
– Data organized using a hash table
– Apply hash function to determine if item with a key is
in the table
– Two ways to organize data
Data Structures Using C++ 2E 53
Summary (cont’d.)
• Hash functions
– Mid-square
– Folding
– Division (modular arithmetic)
• Collision resolution technique categories
– Open addressing (closed hashing)
– Chaining (open hashing)
• Search analysis
– Review number of key comparisons
– Worst case, best case, average case

More Related Content

PPTX
9. Searching & Sorting - Data Structures using C++ by Varsha Patil
PDF
Data structure using c++
PDF
Data Structures 01
PPT
06-search-ar121r11111ay-1-LinearBinary.ppt
PPTX
21CS32 DS Module 1 PPT.pptx
PPTX
Chapter1.pptx
PPTX
PPT Lecture 2.2.1 onn c++ data structures
9. Searching & Sorting - Data Structures using C++ by Varsha Patil
Data structure using c++
Data Structures 01
06-search-ar121r11111ay-1-LinearBinary.ppt
21CS32 DS Module 1 PPT.pptx
Chapter1.pptx
PPT Lecture 2.2.1 onn c++ data structures

Similar to 9780324782011_PPT_ch09.ppt (20)

PPTX
CPP-overviews notes variable data types notes
PPTX
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
PDF
Advanced data structures vol. 1
PPT
01_intro-cpp.ppt
PPT
01_intro-cpp.ppt
PPT
Introduction to data structure and algorithm
PPT
01_intro-cpjgknkhjgjv hugbbf vjouhghp.ppt
PDF
Ds lab handouts
PPTX
15. STL - Data Structures using C++ by Varsha Patil
PPT
Fundamentalsofdatastructures 110501104205-phpapp02
PPTX
cppt-170218053903 (1).pptx
PPTX
DATA STRUCTURE CLASS 12 COMPUTER SCIENCE
PDF
Data and File Structure Lecture Notes
PPTX
Introduction to datastructures presentation
PPTX
1-Introduction to Data Structures beginner.pptx
PPTX
this is a very important presentation that may be use for studding
PPTX
project on data structures and algorithm
PPT
ARRAYS IN C++ CBSE AND STATE +2 COMPUTER SCIENCE
PDF
2nd puc computer science chapter 3 data structures 1
PPTX
Object Oriented Programming Using C++: C++ STL Programming.pptx
CPP-overviews notes variable data types notes
1. Fundamental Concept - Data Structures using C++ by Varsha Patil
Advanced data structures vol. 1
01_intro-cpp.ppt
01_intro-cpp.ppt
Introduction to data structure and algorithm
01_intro-cpjgknkhjgjv hugbbf vjouhghp.ppt
Ds lab handouts
15. STL - Data Structures using C++ by Varsha Patil
Fundamentalsofdatastructures 110501104205-phpapp02
cppt-170218053903 (1).pptx
DATA STRUCTURE CLASS 12 COMPUTER SCIENCE
Data and File Structure Lecture Notes
Introduction to datastructures presentation
1-Introduction to Data Structures beginner.pptx
this is a very important presentation that may be use for studding
project on data structures and algorithm
ARRAYS IN C++ CBSE AND STATE +2 COMPUTER SCIENCE
2nd puc computer science chapter 3 data structures 1
Object Oriented Programming Using C++: C++ STL Programming.pptx
Ad

Recently uploaded (20)

PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Feature types and data preprocessing steps
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
Visual Aids for Exploratory Data Analysis.pdf
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
distributed database system" (DDBS) is often used to refer to both the distri...
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Feature types and data preprocessing steps
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Fundamentals of safety and accident prevention -final (1).pptx
"Array and Linked List in Data Structures with Types, Operations, Implementat...
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
Exploratory_Data_Analysis_Fundamentals.pdf
August 2025 - Top 10 Read Articles in Network Security & Its Applications
Visual Aids for Exploratory Data Analysis.pdf
Ad

9780324782011_PPT_ch09.ppt

  • 1. Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms
  • 2. Data Structures Using C++ 2E 2 Objectives • Learn the various search algorithms • Explore how to implement the sequential and binary search algorithms • Discover how the sequential and binary search algorithms perform • Become aware of the lower bound on comparison- based search algorithms • Learn about hashing
  • 3. Data Structures Using C++ 2E 3 Search Algorithms • Item key – Unique member of the item – Used in searching, sorting, insertion, deletion • Number of key comparisons – Comparing the key of the search item with the key of an item in the list • Can use class arrayListType (Chapter 3) – Implements a list and basic operations in an array
  • 4. Data Structures Using C++ 2E 4 Sequential Search • Array-based lists – Covered in Chapter 3 • Linked lists – Covered in Chapter 5 • Works the same for array-based lists and linked lists • See code on page 499
  • 5. Data Structures Using C++ 2E 5 Sequential Search Analysis • Examine effect of for loop in code on page 499 • Different programmers might implement same algorithm differently • Computer speed affects performance
  • 6. Data Structures Using C++ 2E 6 Sequential Search Analysis (cont’d.) • Sequential search algorithm performance – Examine worst case and average case – Count number of key comparisons • Unsuccessful search – Search item not in list – Make n comparisons • Conducting algorithm performance analysis – Best case: make one key comparison – Worst case: algorithm makes n comparisons
  • 7. Data Structures Using C++ 2E 7 Sequential Search Analysis (cont’d.) • Determining the average number of comparisons – Consider all possible cases – Find number of comparisons for each case – Add number of comparisons, divide by number of cases
  • 8. Data Structures Using C++ 2E 8 Sequential Search Analysis (cont’d.) • Determining the average number of comparisons (cont’d.)
  • 9. Data Structures Using C++ 2E 9 Ordered Lists • Elements ordered according to some criteria – Usually ascending order • Operations – Same as those on an unordered list • Determining if list is empty or full, determining list length, printing the list, clearing the list • Defining ordered list as an abstract data type (ADT) – Use inheritance to derive the class to implement the ordered lists from class arrayListType – Define two classes
  • 10. Data Structures Using C++ 2E 10 Ordered Lists (cont’d.)
  • 11. Data Structures Using C++ 2E 11 Binary Search • Performed only on ordered lists • Uses divide-and-conquer technique FIGURE 9-1 List of length 12 FIGURE 9-2 Search list, list[0]...list[11] FIGURE 9-3 Search list, list[6]...list[11]
  • 12. Data Structures Using C++ 2E 12 Binary Search (cont’d.) • C++ function implementing binary search algorithm
  • 13. Data Structures Using C++ 2E 13 Binary Search (cont’d.) • Example 9-1 FIGURE 9-4 Sorted list for a binary search TABLE 9-1 Values of first, last, and mid and the number of comparisons for search item 89
  • 14. Data Structures Using C++ 2E 14 Binary Search (cont’d.) TABLE 9-2 Values of first, last, and mid and the number of comparisons for search item 34 TABLE 9-3 Values of first, last, and mid and the number of comparisons for search item 22
  • 15. Data Structures Using C++ 2E 15 Insertion into an Ordered List • After insertion: resulting list must be ordered – Find place in the list to insert item • Use algorithm similar to binary search algorithm – Slide list elements one array position down to make room for the item to be inserted – Insert the item • Use function insertAt (class arrayListType)
  • 16. Data Structures Using C++ 2E 16 Insertion into an Ordered List (cont’d.) • Algorithm to insert the item • Function insertOrd implements algorithm
  • 18. Data Structures Using C++ 2E 18 Insertion into an Ordered List (cont’d.) • Add binary search algorithm and the insertOrd algorithm to the class orderedArrayListType
  • 19. Data Structures Using C++ 2E 19 Insertion into an Ordered List (cont’d.) • class orderedArrayListType – Derived from class arrayListType – List elements of orderedArrayListType • Ordered • Must override functions insertAt and insertEnd of class arrayListType in class orderedArrayListType – If these functions are used by an object of type orderedArrayListType, list elements will remain in order
  • 20. Data Structures Using C++ 2E 20 Insertion into an Ordered List (cont’d.) • Can also override function seqSearch – Perform sequential search on an ordered list • Takes into account that elements are ordered TABLE 9-4 Number of comparisons for a list of length n
  • 21. Data Structures Using C++ 2E 21 Lower Bound on Comparison-Based Search Algorithms • Comparison-based search algorithms – Search list by comparing target element with list elements • Sequential search: order n • Binary search: order log2n
  • 22. Data Structures Using C++ 2E 22 Lower Bound on Comparison-Based Search Algorithms (cont’d.) • Devising a search algorithm with order less than log2n – Obtain lower bound on number of comparisons • Cannot be comparison based
  • 23. Data Structures Using C++ 2E 23 Hashing • Algorithm of order one (on average) • Requires data to be specially organized – Hash table • Helps organize data • Stored in an array • Denoted by HT – Hash function • Arithmetic function denoted by h • Applied to key X • Compute h(X): read as h of X • h(X) gives address of the item
  • 24. Data Structures Using C++ 2E 24 Hashing (cont’d.) • Organizing data in the hash table – Store data within the hash table (array) – Store data in linked lists • Hash table HT divided into b buckets – HT[0], HT[1], . . ., HT[b – 1] – Each bucket capable of holding r items – Follows that br = m, where m is the size of HT – Generally r = 1 • Each bucket can hold one item • The hash function h maps key X onto an integer t – h(X) = t, such that 0 <= h(X) <= b – 1
  • 25. Data Structures Using C++ 2E 25 Hashing (cont’d.) • See Examples 9-2 and 9-3 • Synonym – Occurs if h(X1) = h(X2) • Given two keys X1 and X2, such that X1 ≠ X2 • Overflow – Occurs if bucket t full • Collision – Occurs if h(X1) = h(X2) • Given X1 and X2 nonidentical keys
  • 26. Data Structures Using C++ 2E 26 Hashing (cont’d.) • Overflow and collision occur at same time – If r = 1 (bucket size = one) • Choosing a hash function – Main objectives • Choose an easy to compute hash function • Minimize number of collisions • If HTSize denotes the size of hash table (array size holding the hash table) – Assume bucket size = one • Each bucket can hold one item • Overflow and collision occur simultaneously
  • 27. Data Structures Using C++ 2E 27 Hash Functions: Some Examples • Mid-square • Folding • Division (modular arithmetic) – In C++ • h(X) = iX % HTSize; – C++ function
  • 28. Data Structures Using C++ 2E 28 Collision Resolution • Desirable to minimize number of collisions – Collisions unavoidable in reality • Hash function always maps a larger domain onto a smaller range • Collision resolution technique categories – Open addressing (closed hashing) • Data stored within the hash table – Chaining (open hashing) • Data organized in linked lists • Hash table: array of pointers to the linked lists
  • 29. Data Structures Using C++ 2E 29 Collision Resolution: Open Addressing • Data stored within the hash table – For each key X, h(X) gives index in the array • Where item with key X likely to be stored
  • 30. Data Structures Using C++ 2E 30 Linear Probing • Starting at location t – Search array sequentially to find next available slot • Assume circular array – If lower portion of array full • Can continue search in top portion of array using mod operator – Starting at t, check array locations using probe sequence • t, (t + 1) % HTSize, (t + 2) % HTSize, . . ., (t + j) % HTSize
  • 31. Data Structures Using C++ 2E 31 Linear Probing (cont’d.) • The next array slot is given by – (h(X) + j) % HTSize where j is the jth probe • See Example 9-4 • C++ code implementing linear programming
  • 32. Data Structures Using C++ 2E 32 Linear Probing (cont’d.) • Causes clustering – More and more new keys would likely be hashed to the array slots already occupied FIGURE 9-7 Hash table of size 20 with certain positions occupied FIGURE 9-6 Hash table of size 20 with certain positions occupied FIGURE 9-5 Hash table of size 20
  • 33. Data Structures Using C++ 2E 33 Linear Probing (cont’d.) • Improving linear probing – Skip array positions by fixed constant (c) instead of one – New hash address: • If c = 2 and h(X) = 2k (h(X) even) – Only even-numbered array positions visited • If c = 2 and h(X) = 2k + 1, ( h(X) odd) – Only odd-numbered array positions visited • To visit all the array positions – Constant c must be relatively prime to HTSize
  • 34. Data Structures Using C++ 2E 34 Random Probing • Uses random number generator to find next available slot – ith slot in probe sequence: (h(X) + ri) % HTSize • Where ri is the ith value in a random permutation of the numbers 1 to HTSize – 1 – All insertions, searches use same random numbers sequence • See Example 9-5
  • 35. Data Structures Using C++ 2E 35 Rehashing • If collision occurs with hash function h – Use a series of hash functions: h1, h2, . . ., hs – If collision occurs at h(X) • Array slots hi(X), 1 <= hi(X) <= s examined
  • 36. Data Structures Using C++ 2E 36 Quadratic Probing • Suppose – Item with key X hashed at t (h(X) = t and 0 <= t <= HTSize – 1) – Position t already occupied • Starting at position t – Linearly search array at locations (t + 1)% HTSize, (t + 22 ) % HTSize = (t + 4) %HTSize, (t + 32) % HTSize = (t + 9) % HTSize, . . ., (t + i2) % HTSize • Probe sequence: t, (t + 1) % HTSize (t + 22 ) % HTSize, (t + 32) % HTSize, . . ., (t + i2) % HTSize
  • 37. Data Structures Using C++ 2E 37 Quadratic Probing (cont’d.) • See Example 9-6 • Reduces primary clustering • Does not probe all positions in the table – Probes about half the table before repeating probe sequence • When HTSize is a prime – Considerable number of probes • Assume full table • Stop insertion (and search)
  • 38. Data Structures Using C++ 2E 38 Quadratic Probing (cont’d.) • Generating the probe sequence
  • 39. Data Structures Using C++ 2E 39 Quadratic Probing (cont’d.) • Consider probe sequence – t, t +1, t + 22, t + 32, . . . , (t + i2) % HTSize – C++ code computes ith probe • (t + i2) % HTSize
  • 40. Data Structures Using C++ 2E 40 Quadratic Probing (cont’d.) • Pseudocode implementing quadratic probing
  • 41. Data Structures Using C++ 2E 41 Quadratic Probing (cont’d.) • Random, quadratic probings eliminate primary clustering • Secondary clustering – Random, quadratic probing functions of home positions • Not original key
  • 42. Data Structures Using C++ 2E 42 Quadratic Probing (cont’d.) • Secondary clustering (cont’d.) – If two nonidentical keys (X1 and X2) hashed to same home position (h(X1) = h(X2)) • Same probe sequence followed for both keys – If hash function causes a cluster at a particular home position • Cluster remains under these probings
  • 43. Data Structures Using C++ 2E 43 Quadratic Probing (cont’d.) • Solve secondary clustering with double hashing – Use linear probing • Increment value: function of key – If collision occurs at h(X) • Probe sequence generation • See Examples 9-7 and 9-8
  • 44. Data Structures Using C++ 2E 44 Deletion: Open Addressing • Designing a class as an ADT – Implement hashing using quadratic probing • Use two arrays – One stores the data – One uses indexStatusList as described in the previous section • Indicates whether a position in hash table free, occupied, used previously • See code on pages 521 and 522 – Class template implementing hashing as an ADT – Definition of function insert
  • 45. Data Structures Using C++ 2E 45 Collision Resolution: Chaining (Open Hashing) • Hash table HT: array of pointers – For each j, where 0 <= j <= HTsize -1 • HT[j] is a pointer to a linked list • Hash table size (HTSize): less than or equal to the number of items FIGURE 9-10 Linked hash table
  • 46. Data Structures Using C++ 2E 46 Collision Resolution: Chaining (cont’d.) • Item insertion and collision – For each key X (in the item) • First find h(X) – t, where 0 <= t <= HTSize – 1 • Item with this key inserted in linked list pointed to by HT[t] – For nonidentical keys X1 and X2 • If h(X1) = h(X2) – Items with keys X1 and X2 inserted in same linked list • Collision handled quickly, effectively
  • 47. Data Structures Using C++ 2E 47 Collision Resolution: Chaining (cont’d.) • Search – Determine whether item R with key X is in the hash table • First calculate h(X) – Example: h(X) = T • Linked list pointed to by HT[t] searched sequentially • Deletion – Delete item R from the hash table • Search hash table to find where in a linked list R exists • Adjust pointers at appropriate locations • Deallocate memory occupied by R
  • 48. Data Structures Using C++ 2E 48 Collision Resolution: Chaining (cont’d.) • Overflow – No longer a concern • Data stored in linked lists • Memory space to store data allocated dynamically – Hash table size • No longer needs to be greater than number of items – Hash table less than the number of items • Some linked lists contain more than one item • Good hash function has average linked list length still small (search is efficient)
  • 49. Data Structures Using C++ 2E 49 Collision Resolution: Chaining (cont’d.) • Advantages of chaining – Item insertion and deletion: straightforward – Efficient hash function • Few keys hashed to same home position • Short linked list (on average) – Shorter search length • If item size is large – Saves a considerable amount of space
  • 50. Data Structures Using C++ 2E 50 Collision Resolution: Chaining (cont’d.) • Disadvantage of chaining – Small item size wastes space • Example: 1000 items each requires one word of storage – Chaining • Requires 3000 words of storage – Quadratic probing • If hash table size twice number of items: 2000 words • If table size three times number of items – Keys reasonably spread out – Results in fewer collisions
  • 51. Data Structures Using C++ 2E 51 Hashing Analysis • Load factor – Parameter α TABLE 9-5 Number of comparisons in hashing
  • 52. Data Structures Using C++ 2E 52 Summary • Sequential search – Order n • Ordered lists – Elements ordered according to some criteria • Binary search – Order log2n • Hashing – Data organized using a hash table – Apply hash function to determine if item with a key is in the table – Two ways to organize data
  • 53. Data Structures Using C++ 2E 53 Summary (cont’d.) • Hash functions – Mid-square – Folding – Division (modular arithmetic) • Collision resolution technique categories – Open addressing (closed hashing) – Chaining (open hashing) • Search analysis – Review number of key comparisons – Worst case, best case, average case