SlideShare a Scribd company logo
8
Most read
10
Most read
12
Most read
Rabin-Karp Algorithm
Dr. Kiran K
Assistant Professor
Department of CSE
UVCE
Bengaluru, India.
Introduction
• It makes use of elementary number-theoretic notions such as the Equivalence of Two
Numbers Modulo a Third Number.
• ∑ ={ 0, 1, . . . , 9} - Each character is a decimal digit in radix-d notation.
• d = | ∑ |.
• k consecutive characters represent length-k decimal number.
• P [1. . m] : Pattern
• p : Decimal Value corresponding to pattern P [1. . m].
• T [1 . . n] : Text
• ts : Decimal value of the length-m substring T [s + 1 . . s + m], for
s = 0, 1, . . . , n – m.
Introduction…
• ts = p if and only if T [s + 1 . . s + m] = P [1. . m]
→ s is a valid shift iff ts = p
Computing p:
• p can be computed in Θ (m) time using Horner’s Rule:
p = P [m] + 10 (P [m – 1] + 10 (P [m – 2] + . . . + 10 P [2] + P [1]…))
Computing (t0 , t1 , , , tn – m):
• t0 can be computed from T [1 . . m] in Θ (m) time using Horner’s rule.
• However, ts + 1 can be computed from ts in constant time:
ts + 1 = 10 (ts – 10 m – 1 T [s + 1]) + T [s + m + 1]
→ t1 , t2 , , , tn - m can be computed in Θ (n – m) time.
Introduction…
Eg.: T [1 . . 6] = 314152 and pattern length m = 5
Computing t0 using Horner’s Rule:
t0 = T [5] + 10 (T [4] + 10 (T [3] + 10 (T [2] + 10 (T [1])))))
= 5 + 10 (1 + 10 (4 + 10 (1 + 10 (3))))
= 5 + 10 (1 + 10 (4 + 10 (1 + 30)))
= 5 + 10 (1 + 10 (4 + 10 (31)))
= 5 + 10 (1 + 10 (4 + 310))
= 5 + 10 (1 + 10 (314))
= 5 + 10 (1 + 3140)
= 5 + 10 (3141)
= 5 + 31410
= 31415
Introduction…
Computing t1 using : ts + 1 = 10 (ts – 10 m – 1 T [s + 1]) + T [s + m + 1]
s = 0, m = 5, T [1 . . 6] = 314152, t0 = 31415
t1 = 10 (t0 – 10 5 – 1 (T [1])) + T [0 + 5 + 1]
= 10 (t0 – 10 4 (T [1])) + T [6]
= 10 (31415 – 10000 (3)) + 2
= 10 (31415 – 30000) + 2
= 10 (1415) + 2
= 14150 + 2
= 14152
Introduction…
• If 10 m – 1 is precomputed, then computing t1 , t2 , , , tn - m will take a constant number
of arithmetic operations.
• Running Time:
 p can be computed in Θ (m) time.
 t0 , t1 , , , tn - m can be computed in Θ (n – m + 1) time.
 Hence, all occurrences of pattern P [1 . . m] in the text T [1 . . n] can be found
with Θ (m) preprocessing time and Θ (n – m + 1) matching time.
Introduction…
Note:
If p and ts are too large then computing p and ts may not happen in a constant time.
Solution:
• Compute p and the ts values modulo a suitable modulus q.
• If q is chosen to be a prime such that 10q fits within one computer word, then all
necessary computations can be performed with single-precision arithmetic.
• In general with a d-ary alphabet {0, 1, . . . , d – 1} q has to be chosen so that dq fits
within a computer word.
 ts + 1 can be computed as follows:
ts + 1 = (d (ts – T [s + 1] h) + T [s + m + 1]) mod q
h = d m – 1 (mod q) - value of most significant digit in the m-digit text window.
Example
Text : 2359023141526739921
n : 19
Pattern : 31415
M : 5
q : 13
Computing p and t0 using Horner’s Rule:
P [m] + 10 (P [m - 1] + 10 (P [m - 2] + . . . + 10 P [2] + P [1]…))
p = 5 + 10 (1 + 10 (4 + 10 (1 + 10 (3)))) = 31415
t0 = 0 + 10 (9 + 10 (5 + 10 (3 + 10 (2)))) (mod 13) = 23590 mod 13 = 8
Example…
Computing ts + 1 :
ts + 1 = (d (ts – T [s + 1] h) + T [s + m + 1]) mod q, h = d m – 1 (mod q)
h = 10 4 (mod 13) = 3
s = 0, t0 = 8, T [1 . . 6] = 235902
t1 = (10 (8 – 2 (3)) + 2) (mod 13) = (10 (2) + 2) (mod 13) = 22 mod (13) = 9
s = 1, t1 = 9, T [2 . . 7] = 359023
t2 = (10 (9 – 3 (3)) + 3) (mod 13) = (10 (0) + 3) (mod 13) = 3 mod (13) = 3
s = 2, t2 = 3, T [3 . . 8] = 590231
t3 = (10 (3 – 5 (3)) + 1) (mod 13) = (10 (-12) + 1) (mod 13) = -119 mod (13) = 11
Example…
Example…
• ts ≡ p (mod q) - Hit
• ts ≡ p (mod q) but ts ≠ p - Spurious Hit
• ts ≠ p (mod q) → ts ≠ p - Invalid Shift
• ts ≡ p (mod q) and P [1. . m] = T [s + 1 . . s + m] - Valid Shift, s.
Algorithm
RABIN – KARP MATCHER (T, P, d, q)
n = T.length
m = P.length
h = d m - 1 (mod q)
p = 0
t0 = 0
For (i =1 to m)
p = (dp + P [i]) (mod q)
t0 = (dt0 + T [i]) (mod q)
For (s = 0 to n – m)
If (p == ts)
If (P [1 . . m] == T [s + 1 . . s + m])
Print “Pattern occurs with shift ” s
If (s < n – m)
ts + 1 = (d (ts – T [s + 1] h ) + T [s + m + 1]) (mod q)
Running Time:
Preprocessing Time: Θ (m)
Matching Time: Θ ((n – m + 1) m)
References:
• Thomas H Cormen. Charles E Leiserson, Ronald L Rivest, Clifford Stein,
Introduction to Algorithms, Third Edition, The MIT Press Cambridge,
Massachusetts London, England.

More Related Content

PDF
String Matching with Finite Automata and Knuth Morris Pratt Algorithm
PDF
String matching algorithms
PDF
PPTX
Rabin karp string matching algorithm
PDF
Theory of Computation Lecture Notes
PDF
An overview of Hidden Markov Models (HMM)
PPTX
CONTEXT FREE GRAMMAR
PPTX
Knuth morris pratt string matching algo
String Matching with Finite Automata and Knuth Morris Pratt Algorithm
String matching algorithms
Rabin karp string matching algorithm
Theory of Computation Lecture Notes
An overview of Hidden Markov Models (HMM)
CONTEXT FREE GRAMMAR
Knuth morris pratt string matching algo

What's hot (20)

PPT
Discrete Math Lecture 01: Propositional Logic
PPTX
PDA (pushdown automaton)
PPT
Algorithm And analysis Lecture 03& 04-time complexity.
PPT
Theory of computing
PDF
Bellman ford
PPT
PPTX
Naive string matching
PPTX
Lecture 25 hill climbing
PDF
Recurrence relations
PDF
Rabin karp string matcher
PPTX
String matching algorithms
PPTX
Np completeness
PDF
String matching, naive,
PPTX
Recursion DM
PPT
Backtracking Algorithm.ppt
PPTX
NP completeness
DOCX
Introduction to complexity theory assignment
PPTX
NLP_KASHK:Regular Expressions
PPTX
Propositional logic & inference
PDF
9. chapter 8 np hard and np complete problems
Discrete Math Lecture 01: Propositional Logic
PDA (pushdown automaton)
Algorithm And analysis Lecture 03& 04-time complexity.
Theory of computing
Bellman ford
Naive string matching
Lecture 25 hill climbing
Recurrence relations
Rabin karp string matcher
String matching algorithms
Np completeness
String matching, naive,
Recursion DM
Backtracking Algorithm.ppt
NP completeness
Introduction to complexity theory assignment
NLP_KASHK:Regular Expressions
Propositional logic & inference
9. chapter 8 np hard and np complete problems
Ad

Similar to Rabin Karp Algorithm (20)

PPT
String-Matching Algorithms Advance algorithm
PDF
StringMatching-Rabikarp algorithmddd.pdf
PPT
Chap09alg
PPT
Chap09alg
PDF
Modified Rabin Karp
ODP
Approximate Matching (String Algorithms 2007)
PDF
25 String Matching
PPTX
Complex numbers polynomial multiplication
PPTX
Rabin Carp String Matching algorithm
PDF
Pattern Matching Part Three: Hamming Distance
PPTX
Lecture 3 complexity
PPT
2010 3-24 cryptography stamatiou
PDF
A Numeric Algorithm for Generating Permutations in Lexicographic Order with a...
PPT
chap09alg.ppt for string matching algorithm
PPT
L6_Large_number_multi.ppt large number multiplication
PPT
String searching
PPTX
DAA - UNIT 4 - Engineering.pptx
PDF
A New Deterministic RSA-Factoring Algorithm
PPTX
String matching algorithms(knuth morris-pratt)
PPTX
The Complexity Of Primality Testing
String-Matching Algorithms Advance algorithm
StringMatching-Rabikarp algorithmddd.pdf
Chap09alg
Chap09alg
Modified Rabin Karp
Approximate Matching (String Algorithms 2007)
25 String Matching
Complex numbers polynomial multiplication
Rabin Carp String Matching algorithm
Pattern Matching Part Three: Hamming Distance
Lecture 3 complexity
2010 3-24 cryptography stamatiou
A Numeric Algorithm for Generating Permutations in Lexicographic Order with a...
chap09alg.ppt for string matching algorithm
L6_Large_number_multi.ppt large number multiplication
String searching
DAA - UNIT 4 - Engineering.pptx
A New Deterministic RSA-Factoring Algorithm
String matching algorithms(knuth morris-pratt)
The Complexity Of Primality Testing
Ad

More from Kiran K (7)

PDF
Analysis Framework for Analysis of Algorithms.pdf
PDF
Introduction to Algorithm Design and Analysis.pdf
PDF
Johnson's algorithm
PDF
Naive string matching algorithm
PDF
Longest common subsequence
PDF
Single source shortes path in dag
PDF
Matrix chain multiplication
Analysis Framework for Analysis of Algorithms.pdf
Introduction to Algorithm Design and Analysis.pdf
Johnson's algorithm
Naive string matching algorithm
Longest common subsequence
Single source shortes path in dag
Matrix chain multiplication

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
Pharma ospi slides which help in ospi learning
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
master seminar digital applications in india
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Classroom Observation Tools for Teachers
Cell Structure & Organelles in detailed.
GDM (1) (1).pptx small presentation for students
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Pharma ospi slides which help in ospi learning
Abdominal Access Techniques with Prof. Dr. R K Mishra
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Final Presentation General Medicine 03-08-2024.pptx
Computing-Curriculum for Schools in Ghana
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
master seminar digital applications in india
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Anesthesia in Laparoscopic Surgery in India
Classroom Observation Tools for Teachers

Rabin Karp Algorithm

  • 1. Rabin-Karp Algorithm Dr. Kiran K Assistant Professor Department of CSE UVCE Bengaluru, India.
  • 2. Introduction • It makes use of elementary number-theoretic notions such as the Equivalence of Two Numbers Modulo a Third Number. • ∑ ={ 0, 1, . . . , 9} - Each character is a decimal digit in radix-d notation. • d = | ∑ |. • k consecutive characters represent length-k decimal number. • P [1. . m] : Pattern • p : Decimal Value corresponding to pattern P [1. . m]. • T [1 . . n] : Text • ts : Decimal value of the length-m substring T [s + 1 . . s + m], for s = 0, 1, . . . , n – m.
  • 3. Introduction… • ts = p if and only if T [s + 1 . . s + m] = P [1. . m] → s is a valid shift iff ts = p Computing p: • p can be computed in Θ (m) time using Horner’s Rule: p = P [m] + 10 (P [m – 1] + 10 (P [m – 2] + . . . + 10 P [2] + P [1]…)) Computing (t0 , t1 , , , tn – m): • t0 can be computed from T [1 . . m] in Θ (m) time using Horner’s rule. • However, ts + 1 can be computed from ts in constant time: ts + 1 = 10 (ts – 10 m – 1 T [s + 1]) + T [s + m + 1] → t1 , t2 , , , tn - m can be computed in Θ (n – m) time.
  • 4. Introduction… Eg.: T [1 . . 6] = 314152 and pattern length m = 5 Computing t0 using Horner’s Rule: t0 = T [5] + 10 (T [4] + 10 (T [3] + 10 (T [2] + 10 (T [1]))))) = 5 + 10 (1 + 10 (4 + 10 (1 + 10 (3)))) = 5 + 10 (1 + 10 (4 + 10 (1 + 30))) = 5 + 10 (1 + 10 (4 + 10 (31))) = 5 + 10 (1 + 10 (4 + 310)) = 5 + 10 (1 + 10 (314)) = 5 + 10 (1 + 3140) = 5 + 10 (3141) = 5 + 31410 = 31415
  • 5. Introduction… Computing t1 using : ts + 1 = 10 (ts – 10 m – 1 T [s + 1]) + T [s + m + 1] s = 0, m = 5, T [1 . . 6] = 314152, t0 = 31415 t1 = 10 (t0 – 10 5 – 1 (T [1])) + T [0 + 5 + 1] = 10 (t0 – 10 4 (T [1])) + T [6] = 10 (31415 – 10000 (3)) + 2 = 10 (31415 – 30000) + 2 = 10 (1415) + 2 = 14150 + 2 = 14152
  • 6. Introduction… • If 10 m – 1 is precomputed, then computing t1 , t2 , , , tn - m will take a constant number of arithmetic operations. • Running Time:  p can be computed in Θ (m) time.  t0 , t1 , , , tn - m can be computed in Θ (n – m + 1) time.  Hence, all occurrences of pattern P [1 . . m] in the text T [1 . . n] can be found with Θ (m) preprocessing time and Θ (n – m + 1) matching time.
  • 7. Introduction… Note: If p and ts are too large then computing p and ts may not happen in a constant time. Solution: • Compute p and the ts values modulo a suitable modulus q. • If q is chosen to be a prime such that 10q fits within one computer word, then all necessary computations can be performed with single-precision arithmetic. • In general with a d-ary alphabet {0, 1, . . . , d – 1} q has to be chosen so that dq fits within a computer word.  ts + 1 can be computed as follows: ts + 1 = (d (ts – T [s + 1] h) + T [s + m + 1]) mod q h = d m – 1 (mod q) - value of most significant digit in the m-digit text window.
  • 8. Example Text : 2359023141526739921 n : 19 Pattern : 31415 M : 5 q : 13 Computing p and t0 using Horner’s Rule: P [m] + 10 (P [m - 1] + 10 (P [m - 2] + . . . + 10 P [2] + P [1]…)) p = 5 + 10 (1 + 10 (4 + 10 (1 + 10 (3)))) = 31415 t0 = 0 + 10 (9 + 10 (5 + 10 (3 + 10 (2)))) (mod 13) = 23590 mod 13 = 8
  • 9. Example… Computing ts + 1 : ts + 1 = (d (ts – T [s + 1] h) + T [s + m + 1]) mod q, h = d m – 1 (mod q) h = 10 4 (mod 13) = 3 s = 0, t0 = 8, T [1 . . 6] = 235902 t1 = (10 (8 – 2 (3)) + 2) (mod 13) = (10 (2) + 2) (mod 13) = 22 mod (13) = 9 s = 1, t1 = 9, T [2 . . 7] = 359023 t2 = (10 (9 – 3 (3)) + 3) (mod 13) = (10 (0) + 3) (mod 13) = 3 mod (13) = 3 s = 2, t2 = 3, T [3 . . 8] = 590231 t3 = (10 (3 – 5 (3)) + 1) (mod 13) = (10 (-12) + 1) (mod 13) = -119 mod (13) = 11
  • 11. Example… • ts ≡ p (mod q) - Hit • ts ≡ p (mod q) but ts ≠ p - Spurious Hit • ts ≠ p (mod q) → ts ≠ p - Invalid Shift • ts ≡ p (mod q) and P [1. . m] = T [s + 1 . . s + m] - Valid Shift, s.
  • 12. Algorithm RABIN – KARP MATCHER (T, P, d, q) n = T.length m = P.length h = d m - 1 (mod q) p = 0 t0 = 0 For (i =1 to m) p = (dp + P [i]) (mod q) t0 = (dt0 + T [i]) (mod q) For (s = 0 to n – m) If (p == ts) If (P [1 . . m] == T [s + 1 . . s + m]) Print “Pattern occurs with shift ” s If (s < n – m) ts + 1 = (d (ts – T [s + 1] h ) + T [s + m + 1]) (mod q) Running Time: Preprocessing Time: Θ (m) Matching Time: Θ ((n – m + 1) m)
  • 13. References: • Thomas H Cormen. Charles E Leiserson, Ronald L Rivest, Clifford Stein, Introduction to Algorithms, Third Edition, The MIT Press Cambridge, Massachusetts London, England.