SlideShare a Scribd company logo
David Luebke 1 02/10/17
CS 332: Algorithms
Introduction to Hashing
David Luebke 2 02/10/17
Review: Hashing Tables
● Motivation: symbol tables
■ A compiler uses a symbol table to relate symbols
to associated data
○ Symbols: variable names, procedure names, etc.
○ Associated data: memory location, call graph, etc.
■ For a symbol table (also called a dictionary), we
care about search, insertion, and deletion
■ We typically don’t care about sorted order
David Luebke 3 02/10/17
Review: Hash Tables
● Hash table:
■ Given a table T and a record x, with key (=
symbol) and satellite data, we need to support:
○ Insert (T, x)
○ Delete (T, x)
○ Search(T, x)
■ We want these to be fast, but don’t care about
sorting the records
■ In this discussion we consider all keys to be
(possibly large) natural numbers
David Luebke 4 02/10/17
Review: Hash Tables
● More formally:
■ Given a table T and a record x, with key (=
symbol) and satellite data, we need to support:
○ Insert (T, x)
○ Delete (T, x)
○ Search(T, x)
■ We want these to be fast, but don’t care about
sorting the records
● The structure we will use is a hash table
■ Supports all the above in O(1) expected time!
David Luebke 5 02/10/17
Hashing: Keys
● In the following discussions we will consider
all keys to be (possibly large) natural numbers
● How can we convert floats to natural numbers
for hashing purposes?
● How can we convert ASCII strings to natural
numbers for hashing purposes?
David Luebke 6 02/10/17
Direct Addressing
● Suppose:
■ The range of keys is 0..m-1
■ Keys are distinct
● The idea:
■ Set up an array T[0..m-1] in which
○ T[i] = x if x∈ T and key[x] = i
○ T[i] = NULL otherwise
■ This is called a direct-address table
○ Operations take O(1) time!
○ So what’s the problem?
David Luebke 7 02/10/17
The Problem With
Direct Addressing
● Direct addressing works well when the range
m of keys is relatively small
● But what if the keys are 32-bit integers?
■ Problem 1: direct-address table will have
232
entries, more than 4 billion
■ Problem 2: even if memory is not an issue, the
time to initialize the elements to NULL may be
● Solution: map keys to smaller range 0..m-1
● This mapping is called a hash function
David Luebke 8 02/10/17
Review: Direct Addressing
● Suppose:
■ The range of keys is 0..m-1
■ Keys are distinct
● The idea:
■ Set up an array T[0..m-1] in which
○ T[i] = x if x∈ T and key[x] = i
○ T[i] = NULL otherwise
■ This is called a direct-address table
○ Operations take O(1) time!
David Luebke 9 02/10/17
Review: The Problem With
Direct Addressing
● Direct addressing works well when the range
m of keys is relatively small
● But what if the keys are 32-bit integers?
■ Problem 1: direct-address table will have
232
entries, more than 4 billion
■ Problem 2: even if memory is not an issue, the
time to initialize the elements to NULL may be
● Solution: map keys to smaller range 0..m-1
● This mapping is called a hash function
David Luebke 10 02/10/17
Hash Functions
● Next problem: collision
T
0
m - 1
h(k1)
h(k4)
h(k2) = h(k5)
h(k3)
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
David Luebke 11 02/10/17
Resolving Collisions
● How can we solve the problem of collisions?
● Solution 1: chaining
● Solution 2: open addressing
David Luebke 12 02/10/17
Open Addressing
● Basic idea (details in Section 12.4):
■ To insert: if slot is full, try another slot, …, until
an open slot is found (probing)
■ To search, follow same sequence of probes as
would be used when inserting the element
○ If reach element with correct key, return it
○ If reach a NULL pointer, element is not in table
● Good for fixed sets (adding but no deletion)
■ Example: spell checking
● Table needn’t be much bigger than n
David Luebke 13 02/10/17
Chaining
● Chaining puts elements that hash to the same
slot in a linked list:
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 14 02/10/17
Chaining
● How do we insert an element?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 15 02/10/17
Chaining
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
● How do we delete an element?
■ Do we need a doubly-linked list for efficient delete?
David Luebke 16 02/10/17
Chaining
● How do we search for a element with a
given key?
——
——
——
——
——
——
T
k4
k2
k3
k1
k5
U
(universe of keys)
K
(actual
keys)
k6
k8
k7
k1 k4 ——
k5 k2
k3
k8 k6 ——
——
k7 ——
David Luebke 17 02/10/17
Analysis of Chaining
● Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
● Given n keys and m slots in the table: the
load factor α = n/m = average # keys per slot
● What will be the average cost of an
unsuccessful search for a key?
David Luebke 18 02/10/17
Analysis of Chaining
● Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the
load factor α = n/m = average # keys per slot
● What will be the average cost of an
unsuccessful search for a key? A: O(1+α)
David Luebke 19 02/10/17
Analysis of Chaining
● Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the
load factor α = n/m = average # keys per slot
● What will be the average cost of an
unsuccessful search for a key? A: O(1+α)
● What will be the average cost of a successful
search?
David Luebke 20 02/10/17
Analysis of Chaining
● Assume simple uniform hashing: each key in
table is equally likely to be hashed to any slot
● Given n keys and m slots in the table, the
load factor α = n/m = average # keys per slot
● What will be the average cost of an
unsuccessful search for a key? A: O(1+α)
● What will be the average cost of a successful
search? A: O(1 + α/2) = O(1 + α)
David Luebke 21 02/10/17
Analysis of Chaining Continued
● So the cost of searching = O(1 + α)
● If the number of keys n is proportional to the
number of slots in the table, what is α?
● A: α = O(1)
■ In other words, we can make the expected cost of
searching constant if we make α constant
David Luebke 22 02/10/17
Choosing A Hash Function
● Clearly choosing the hash function well is
crucial
■ What will a worst-case hash function do?
■ What will be the time to search in this case?
● What are desirable features of the hash
function?
■ Should distribute keys uniformly into slots
■ Should not depend on patterns in the data
David Luebke 23 02/10/17
Hash Functions:
The Division Method
● h(k) = k mod m
■ In words: hash k into a table with m slots using the
slot given by the remainder of k divided by m
● What happens to elements with adjacent
values of k?
● What happens if m is a power of 2 (say 2P
)?
● What if m is a power of 10?
● Upshot: pick table size m = prime number not
too close to a power of 2 (or 10)
David Luebke 24 02/10/17
Hash Functions:
The Multiplication Method
● For a constant A, 0 < A < 1:
● h(k) =  m (kA - kA) 
What does this term represent?
David Luebke 25 02/10/17
Hash Functions:
The Multiplication Method
● For a constant A, 0 < A < 1:
● h(k) =  m (kA - kA) 
● Choose m = 2P
● Choose A not too close to 0 or 1
● Knuth: Good choice for A = (√5 - 1)/2
Fractional part of kA
David Luebke 26 02/10/17
Hash Functions:
Worst Case Scenario
● Scenario:
■ You are given an assignment to implement hashing
■ You will self-grade in pairs, testing and grading
your partner’s implementation
■ In a blatant violation of the honor code, your
partner:
○ Analyzes your hash function
○ Picks a sequence of “worst-case” keys, causing your
implementation to take O(n) time to search
● What’s an honest CS student to do?
David Luebke 27 02/10/17
Hash Functions:
Universal Hashing
● As before, when attempting to foil an
malicious adversary: randomize the algorithm
● Universal hashing: pick a hash function
randomly in a way that is independent of the
keys that are actually going to be stored
■ Guarantees good performance on average, no
matter what keys adversary chooses
David Luebke 28 02/10/17
The End

More Related Content

PPTX
Hashing Technique In Data Structures
PPT
Chapter 12 ds
PPT
PPTX
Collision in Hashing.pptx
PPTX
Hash table in data structure and algorithm
PPT
Data Structure and Algorithms Hashing
PPTX
Hashing in datastructure
Hashing Technique In Data Structures
Chapter 12 ds
Collision in Hashing.pptx
Hash table in data structure and algorithm
Data Structure and Algorithms Hashing
Hashing in datastructure

What's hot (20)

PPTX
Doubly Linked List
PPTX
Binary Search Tree in Data Structure
PPT
Data Structures- Part5 recursion
PPTX
Stressen's matrix multiplication
PPTX
Tree Traversal
PPT
Hash tables
PPTX
sorting and its types
PPTX
Linked List - Insertion & Deletion
PPTX
Data structure - Graph
PPT
Hashing PPT
PDF
Daa notes 1
PPTX
Hashing In Data Structure
PPTX
Skip lists (Advance Data structure)
PDF
Data Structures & Algorithm design using C
PPTX
Priority Queue in Data Structure
PPTX
Doubly Linked List || Operations || Algorithms
PPT
3.9 external sorting
PPTX
Heap Sort in Design and Analysis of algorithms
PPT
Data Structure and Algorithms Binary Search Tree
PPTX
Unit I - Evaluation of expression
Doubly Linked List
Binary Search Tree in Data Structure
Data Structures- Part5 recursion
Stressen's matrix multiplication
Tree Traversal
Hash tables
sorting and its types
Linked List - Insertion & Deletion
Data structure - Graph
Hashing PPT
Daa notes 1
Hashing In Data Structure
Skip lists (Advance Data structure)
Data Structures & Algorithm design using C
Priority Queue in Data Structure
Doubly Linked List || Operations || Algorithms
3.9 external sorting
Heap Sort in Design and Analysis of algorithms
Data Structure and Algorithms Binary Search Tree
Unit I - Evaluation of expression
Ad

Viewers also liked (20)

PPT
18 hashing
PDF
Algoritma dan Matematika_tif305_reg-sns
PPTX
Steganography and its techniques
PPTX
Steganography basic
PDF
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
PPTX
Hash map
PPTX
GameMaker 1) intro to gamemaker
PPTX
Spring IOC and DAO
PPTX
Datastructure tree
PPT
Binary tree
PDF
How Twitter Works (Arsen Kostenko Technology Stream)
PDF
Implementasi Teknik Kompresi Teks Huffman
PDF
Fungsi Hash & Algoritma SHA-256 - Presentation
PDF
08 Hash Tables
PPSX
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
PPTX
Algorithm & Data Structure - Algoritma Pengurutan
PPTX
C++ Pointers
PPTX
Amazon interview questions
PPT
Unit 6 pointers
PPT
Data structures
18 hashing
Algoritma dan Matematika_tif305_reg-sns
Steganography and its techniques
Steganography basic
Cassandra Community Webinar | Cassandra 2.0 - Better, Faster, Stronger
Hash map
GameMaker 1) intro to gamemaker
Spring IOC and DAO
Datastructure tree
Binary tree
How Twitter Works (Arsen Kostenko Technology Stream)
Implementasi Teknik Kompresi Teks Huffman
Fungsi Hash & Algoritma SHA-256 - Presentation
08 Hash Tables
Metrics at Scale @ UBER (Mantas Klasavicius Technology Stream)
Algorithm & Data Structure - Algoritma Pengurutan
C++ Pointers
Amazon interview questions
Unit 6 pointers
Data structures
Ad

Similar to Hash table (20)

PPT
Analysis Of Algorithms - Hashing
PPTX
session 15 hashing.pptx
PPT
4.4 hashing02
PPT
Hashing Techniques in Data Strucures and Algorithm
PPT
Advance algorithm hashing lec II
PPTX
Lec12-Hash-Tables-27122022-125641pm.pptx
PPTX
unit 3 Divide and Conquer Rule and Sorting.pptx
PPT
Hash presentation
PPTX
Hashing a searching technique in data structures
PPTX
hashing in data strutures advanced in languae java
PPT
lecture10.ppt
PPT
11_hashtable-1.ppt. Data structure algorithm
PPT
Hashing in Data Structure and analysis of Algorithms
PPTX
Hashing techniques, Hashing function,Collision detection techniques
PPT
Hashing
PPTX
Unit viii searching and hashing
PDF
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
PPT
13-hashing.ppt
PPTX
Hashing .pptx
PDF
L21_Hashing.pdf
Analysis Of Algorithms - Hashing
session 15 hashing.pptx
4.4 hashing02
Hashing Techniques in Data Strucures and Algorithm
Advance algorithm hashing lec II
Lec12-Hash-Tables-27122022-125641pm.pptx
unit 3 Divide and Conquer Rule and Sorting.pptx
Hash presentation
Hashing a searching technique in data structures
hashing in data strutures advanced in languae java
lecture10.ppt
11_hashtable-1.ppt. Data structure algorithm
Hashing in Data Structure and analysis of Algorithms
Hashing techniques, Hashing function,Collision detection techniques
Hashing
Unit viii searching and hashing
hashtableeeeeeeeeeeeeeeeeeeeeeeeeeee.pdf
13-hashing.ppt
Hashing .pptx
L21_Hashing.pdf

More from Rajendran (20)

PPT
Element distinctness lower bounds
PPT
Scheduling with Startup and Holding Costs
PPT
Divide and conquer surfing lower bounds
PPT
Red black tree
PPT
Medians and order statistics
PPT
Proof master theorem
PPT
Recursion tree method
PPT
Recurrence theorem
PPT
Master method
PPT
Master method theorem
PPT
Lower bound
PPT
Master method theorem
PPT
Greedy algorithms
PPT
Longest common subsequences in Algorithm Analysis
PPT
Dynamic programming in Algorithm Analysis
PPT
Average case Analysis of Quicksort
PPT
Np completeness
PPT
computer languages
PPT
proving non-computability
PPT
the halting_problem
Element distinctness lower bounds
Scheduling with Startup and Holding Costs
Divide and conquer surfing lower bounds
Red black tree
Medians and order statistics
Proof master theorem
Recursion tree method
Recurrence theorem
Master method
Master method theorem
Lower bound
Master method theorem
Greedy algorithms
Longest common subsequences in Algorithm Analysis
Dynamic programming in Algorithm Analysis
Average case Analysis of Quicksort
Np completeness
computer languages
proving non-computability
the halting_problem

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
Insiders guide to clinical Medicine.pdf
PPTX
master seminar digital applications in india
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Pre independence Education in Inndia.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
01-Introduction-to-Information-Management.pdf
Complications of Minimal Access Surgery at WLH
Insiders guide to clinical Medicine.pdf
master seminar digital applications in india
102 student loan defaulters named and shamed – Is someone you know on the list?
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pre independence Education in Inndia.pdf
Renaissance Architecture: A Journey from Faith to Humanism
VCE English Exam - Section C Student Revision Booklet
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Final Presentation General Medicine 03-08-2024.pptx
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
01-Introduction-to-Information-Management.pdf

Hash table

  • 1. David Luebke 1 02/10/17 CS 332: Algorithms Introduction to Hashing
  • 2. David Luebke 2 02/10/17 Review: Hashing Tables ● Motivation: symbol tables ■ A compiler uses a symbol table to relate symbols to associated data ○ Symbols: variable names, procedure names, etc. ○ Associated data: memory location, call graph, etc. ■ For a symbol table (also called a dictionary), we care about search, insertion, and deletion ■ We typically don’t care about sorted order
  • 3. David Luebke 3 02/10/17 Review: Hash Tables ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support: ○ Insert (T, x) ○ Delete (T, x) ○ Search(T, x) ■ We want these to be fast, but don’t care about sorting the records ■ In this discussion we consider all keys to be (possibly large) natural numbers
  • 4. David Luebke 4 02/10/17 Review: Hash Tables ● More formally: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support: ○ Insert (T, x) ○ Delete (T, x) ○ Search(T, x) ■ We want these to be fast, but don’t care about sorting the records ● The structure we will use is a hash table ■ Supports all the above in O(1) expected time!
  • 5. David Luebke 5 02/10/17 Hashing: Keys ● In the following discussions we will consider all keys to be (possibly large) natural numbers ● How can we convert floats to natural numbers for hashing purposes? ● How can we convert ASCII strings to natural numbers for hashing purposes?
  • 6. David Luebke 6 02/10/17 Direct Addressing ● Suppose: ■ The range of keys is 0..m-1 ■ Keys are distinct ● The idea: ■ Set up an array T[0..m-1] in which ○ T[i] = x if x∈ T and key[x] = i ○ T[i] = NULL otherwise ■ This is called a direct-address table ○ Operations take O(1) time! ○ So what’s the problem?
  • 7. David Luebke 7 02/10/17 The Problem With Direct Addressing ● Direct addressing works well when the range m of keys is relatively small ● But what if the keys are 32-bit integers? ■ Problem 1: direct-address table will have 232 entries, more than 4 billion ■ Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be ● Solution: map keys to smaller range 0..m-1 ● This mapping is called a hash function
  • 8. David Luebke 8 02/10/17 Review: Direct Addressing ● Suppose: ■ The range of keys is 0..m-1 ■ Keys are distinct ● The idea: ■ Set up an array T[0..m-1] in which ○ T[i] = x if x∈ T and key[x] = i ○ T[i] = NULL otherwise ■ This is called a direct-address table ○ Operations take O(1) time!
  • 9. David Luebke 9 02/10/17 Review: The Problem With Direct Addressing ● Direct addressing works well when the range m of keys is relatively small ● But what if the keys are 32-bit integers? ■ Problem 1: direct-address table will have 232 entries, more than 4 billion ■ Problem 2: even if memory is not an issue, the time to initialize the elements to NULL may be ● Solution: map keys to smaller range 0..m-1 ● This mapping is called a hash function
  • 10. David Luebke 10 02/10/17 Hash Functions ● Next problem: collision T 0 m - 1 h(k1) h(k4) h(k2) = h(k5) h(k3) k4 k2 k3 k1 k5 U (universe of keys) K (actual keys)
  • 11. David Luebke 11 02/10/17 Resolving Collisions ● How can we solve the problem of collisions? ● Solution 1: chaining ● Solution 2: open addressing
  • 12. David Luebke 12 02/10/17 Open Addressing ● Basic idea (details in Section 12.4): ■ To insert: if slot is full, try another slot, …, until an open slot is found (probing) ■ To search, follow same sequence of probes as would be used when inserting the element ○ If reach element with correct key, return it ○ If reach a NULL pointer, element is not in table ● Good for fixed sets (adding but no deletion) ■ Example: spell checking ● Table needn’t be much bigger than n
  • 13. David Luebke 13 02/10/17 Chaining ● Chaining puts elements that hash to the same slot in a linked list: —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 14. David Luebke 14 02/10/17 Chaining ● How do we insert an element? —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 15. David Luebke 15 02/10/17 Chaining —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 —— ● How do we delete an element? ■ Do we need a doubly-linked list for efficient delete?
  • 16. David Luebke 16 02/10/17 Chaining ● How do we search for a element with a given key? —— —— —— —— —— —— T k4 k2 k3 k1 k5 U (universe of keys) K (actual keys) k6 k8 k7 k1 k4 —— k5 k2 k3 k8 k6 —— —— k7 ——
  • 17. David Luebke 17 02/10/17 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table: the load factor α = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key?
  • 18. David Luebke 18 02/10/17 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor α = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+α)
  • 19. David Luebke 19 02/10/17 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor α = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+α) ● What will be the average cost of a successful search?
  • 20. David Luebke 20 02/10/17 Analysis of Chaining ● Assume simple uniform hashing: each key in table is equally likely to be hashed to any slot ● Given n keys and m slots in the table, the load factor α = n/m = average # keys per slot ● What will be the average cost of an unsuccessful search for a key? A: O(1+α) ● What will be the average cost of a successful search? A: O(1 + α/2) = O(1 + α)
  • 21. David Luebke 21 02/10/17 Analysis of Chaining Continued ● So the cost of searching = O(1 + α) ● If the number of keys n is proportional to the number of slots in the table, what is α? ● A: α = O(1) ■ In other words, we can make the expected cost of searching constant if we make α constant
  • 22. David Luebke 22 02/10/17 Choosing A Hash Function ● Clearly choosing the hash function well is crucial ■ What will a worst-case hash function do? ■ What will be the time to search in this case? ● What are desirable features of the hash function? ■ Should distribute keys uniformly into slots ■ Should not depend on patterns in the data
  • 23. David Luebke 23 02/10/17 Hash Functions: The Division Method ● h(k) = k mod m ■ In words: hash k into a table with m slots using the slot given by the remainder of k divided by m ● What happens to elements with adjacent values of k? ● What happens if m is a power of 2 (say 2P )? ● What if m is a power of 10? ● Upshot: pick table size m = prime number not too close to a power of 2 (or 10)
  • 24. David Luebke 24 02/10/17 Hash Functions: The Multiplication Method ● For a constant A, 0 < A < 1: ● h(k) =  m (kA - kA)  What does this term represent?
  • 25. David Luebke 25 02/10/17 Hash Functions: The Multiplication Method ● For a constant A, 0 < A < 1: ● h(k) =  m (kA - kA)  ● Choose m = 2P ● Choose A not too close to 0 or 1 ● Knuth: Good choice for A = (√5 - 1)/2 Fractional part of kA
  • 26. David Luebke 26 02/10/17 Hash Functions: Worst Case Scenario ● Scenario: ■ You are given an assignment to implement hashing ■ You will self-grade in pairs, testing and grading your partner’s implementation ■ In a blatant violation of the honor code, your partner: ○ Analyzes your hash function ○ Picks a sequence of “worst-case” keys, causing your implementation to take O(n) time to search ● What’s an honest CS student to do?
  • 27. David Luebke 27 02/10/17 Hash Functions: Universal Hashing ● As before, when attempting to foil an malicious adversary: randomize the algorithm ● Universal hashing: pick a hash function randomly in a way that is independent of the keys that are actually going to be stored ■ Guarantees good performance on average, no matter what keys adversary chooses
  • 28. David Luebke 28 02/10/17 The End