SlideShare a Scribd company logo
Knuth-Morris-Pratt
Algorithm
Prepared by: Mayank Agarwal
Nitesh Maan
The problem of String Matching
Given a string ‘S’, the problem of string
matching deals with finding whether a
pattern ‘p’ occurs in ‘S’ and if ‘p’ does
occur then returning position in ‘S’ where
‘p’ occurs.
…. a O(mn) approach
One of the most obvious approach towards the
string matching problem would be to compare
the first element of the pattern to be searched
‘p’, with the first element of the string ‘S’ in which
to locate ‘p’. If the first element of ‘p’ matches
the first element of ‘S’, compare the second
element of ‘p’ with second element of ‘S’. If
match found proceed likewise until entire ‘p’ is
found. If a mismatch is found at any position,
shift ‘p’ one position to the right and repeat
comparison beginning from first element of ‘p’.
How does the O(mn) approach
work
Below is an illustration of how the previously
described O(mn) approach works.
String S a b c a b a a b c a b a c
Pattern p a b a a
Step 1:compare p[1] with S[1]
S a b c a b a a b c a b a c
p a b a a
Step 2: compare p[2] with S[2]
S a b c a b a a b c a b a c
p a b a a
Step 3: compare p[3] with S[3]
S
p a b a a
Mismatch occurs here..
Since mismatch is detected, shift ‘p’ one position to the left and
perform steps analogous to those from step 1 to step 3. At position
where mismatch is detected, shift ‘p’ one position to the right and
repeat matching procedure.
a b c a b a a b c a b a c
S a b c a b a a b c a b a c
p a b a a
Finally, a match would be found after shifting ‘p’ three times to the right side.
Drawbacks of this approach: if ‘m’ is the length of pattern ‘p’ and ‘n’ the length
of string ‘S’, the matching time is of the order O(mn). This is a certainly a very
slow running algorithm.
What makes this approach so slow is the fact that elements of ‘S’ with which
comparisons had been performed earlier are involved again and again in
comparisons in some future iterations. For example: when mismatch is
detected for the first time in comparison of p[3] with S[3], pattern ‘p’ would be
moved one position to the right and matching procedure would resume from
here. Here the first comparison that would take place would be between p[0]=‘a’
and S[1]=‘b’. It should be noted here that S[1]=‘b’ had been previously involved
in a comparison in step 2. this is a repetitive use of S[1] in another comparison.
It is these repetitive comparisons that lead to the runtime of O(mn).
The Knuth-Morris-Pratt Algorithm
Knuth, Morris and Pratt proposed a linear
time algorithm for the string matching
problem.
A matching time of O(n) is achieved by
avoiding comparisons with elements of ‘S’
that have previously been involved in
comparison with some element of the
pattern ‘p’ to be matched. i.e.,
backtracking on the string ‘S’ never occurs
Components of KMP algorithm
 The prefix function, Π
The prefix function,Π for a pattern encapsulates
knowledge about how the pattern matches
against shifts of itself. This information can be
used to avoid useless shifts of the pattern ‘p’. In
other words, this enables avoiding backtracking
on the string ‘S’.
 The KMP Matcher
With string ‘S’, pattern ‘p’ and prefix function ‘Π’ as
inputs, finds the occurrence of ‘p’ in ‘S’ and
returns the number of shifts of ‘p’ after which
occurrence is found.
The prefix function, Π
Following pseudocode computes the prefix fucnction, Π:
Compute-Prefix-Function (p)
1 m  length[p] //’p’ pattern to be matched
2 Π[1]  0
3 k  0
4 for q  2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k  Π[k]
7 If p[k+1] = p[q]
8 then k  k +1
9 Π[q]  k
10 return Π
Example: compute Π for the pattern ‘p’ below:
p a b a b a c a
Initially: m = length[p] = 7
Π[1] = 0
k = 0
Step 1: q = 2, k=0
Π[2] = 0
Step 2: q = 3, k = 0,
Π[3] = 1
Step 3: q = 4, k = 1
Π[4] = 2
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1
q 1 2 3 4 5 6 7
p a b a b a c A
Π 0 0 1 2
Step 4: q = 5, k =2
Π[5] = 3
Step 5: q = 6, k = 3
Π[6] = 1
Step 6: q = 7, k = 1
Π[7] = 1
After iterating 6 times, the prefix
function computation is
complete: 
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 1
q 1 2 3 4 5 6 7
p a b a b a c a
Π 0 0 1 2 3 1 1
q 1 2 3 4 5 6 7
p a b A b a c a
Π 0 0 1 2 3 1 1
The KMP Matcher
The KMP Matcher, with pattern ‘p’, string ‘S’ and prefix function ‘Π’ as input, finds a match of p in S.
Following pseudocode computes the matching component of KMP algorithm:
KMP-Matcher(S,p)
1 n  length[S]
2 m  length[p]
3 Π  Compute-Prefix-Function(p)
4 q  0 //number of characters matched
5 for i  1 to n //scan S from left to right
6 do while q > 0 and p[q+1] != S[i]
7 do q  Π[q] //next character does not match
8 if p[q+1] = S[i]
9 then q  q + 1 //next character matches
10 if q = m //is all of p matched?
11 then print “Pattern occurs with shift” i – m
12 q  Π[ q] // look for the next match
Note: KMP finds every occurrence of a ‘p’ in ‘S’. That is why KMP does not terminate in step 12,
rather it searches remainder of ‘S’ for any more occurrences of ‘p’.
Illustration: given a String ‘S’ and pattern ‘p’ as
follows:
S b a c b a b a b a b a c a c a
p a b a b a c a
Let us execute the KMP algorithm to find
whether ‘p’ occurs in ‘S’.
For ‘p’ the prefix function, Π was computed previously and is as follows:
q 1 2 3 4 5 6 7
p a b A b a c a
Π 0 0 1 2 3 1 1
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
Initially: n = size of S = 15;
m = size of p = 7
Step 1: i = 1, q = 0
comparing p[1] with S[1]
S
p
P[1] does not match with S[1]. ‘p’ will be shifted one position to the right.
S
p a b a b a c a
Step 2: i = 2, q = 0
comparing p[1] with S[2]
P[1] matches S[2]. Since there is a match, p is not shifted.
Step 3: i = 3, q = 1
b a c b a b a b a b a c a a b
Comparing p[2] with S[3]
S
a b a b a c a
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
p
S
p
S
p
p[2] does not match with S[3]
Backtracking on p, comparing p[1] and S[3]
Step 4: i = 4, q = 0
comparing p[1] with S[4] p[1] does not match with S[4]
Step 5: i = 5, q = 0
comparing p[1] with S[5] p[1] matches with S[5]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
a b a b a c a
Step 6: i = 6, q = 1
S
p
Comparing p[2] with S[6] p[2] matches with S[6]
S
p
Step 7: i = 7, q = 2
Comparing p[3] with S[7] p[3] matches with S[7]
Step 8: i = 8, q = 3
Comparing p[4] with S[8] p[4] matches with S[8]
S
p
Step 9: i = 9, q = 4
Comparing p[5] with S[9]
Comparing p[6] with S[10]
Comparing p[5] with S[11]
Step 10: i = 10, q = 5
Step 11: i = 11, q = 4
S
S
S
p
p
p
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
a b a b a c a
p[6] doesn’t match with S[10]
Backtracking on p, comparing p[4] with S[10] because after mismatch q = Π[5] = 3
p[5] matches with S[9]
p[5] matches with S[11]
b a c b a b a b a b a c a a b
b a c b a b a b a b a c a a b
a b a b a c a
a b a b a c a
Step 12: i = 12, q = 5
Comparing p[6] with S[12]
Comparing p[7] with S[13]
S
S
p
p
Step 13: i = 13, q = 6
p[6] matches with S[12]
p[7] matches with S[13]
Pattern ‘p’ has been found to completely occur in string ‘S’. The total number of shifts
that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
Running - time analysis
 Compute-Prefix-Function (Π)
1 m  length[p] //’p’ pattern to be matched
2 Π[1]  0
3 k  0
4 for q  2 to m
5 do while k > 0 and p[k+1] != p[q]
6 do k  Π[k]
7 If p[k+1] = p[q]
8 then k  k +1
9 Π[q]  k
10 return Π
In the above pseudocode for computing the prefix function, the for loop from step 4 to step 10 runs
‘m’ times. Step 1 to step 3 take constant time. Hence the running time of compute prefix
function is Θ(m).
 KMP Matcher
1 n  length[S]
2 m  length[p]
3 Π  Compute-Prefix-Function(p)
4 q  0
5 for i  1 to n
6 do while q > 0 and p[q+1] != S[i]
7 do q  Π[q]
8 if p[q+1] = S[i]
9 then q  q + 1
10 if q = m
11 then print “Pattern occurs with shift” i –
m
12 q  Π[ q]
The for loop beginning in step 5 runs ‘n’ times,
i.e., as long as the length of the string ‘S’.
Since step 1 to step 4 take constant time,
the running time is dominated by this for
loop. Thus running time of matching function
is Θ(n).

More Related Content

PPT
KMP Pattern Matching algorithm
PPTX
String matching algorithms(knuth morris-pratt)
PPT
String matching algorithm
PPTX
Knuth morris pratt string matching algo
PPTX
KMP String Matching Algorithm
PPT
lec17.ppt
PPT
String searching
PPTX
String Matching (Naive,Rabin-Karp,KMP)
KMP Pattern Matching algorithm
String matching algorithms(knuth morris-pratt)
String matching algorithm
Knuth morris pratt string matching algo
KMP String Matching Algorithm
lec17.ppt
String searching
String Matching (Naive,Rabin-Karp,KMP)

Similar to String-Matching algorithms KNuth-Morri-Pratt.pptx (20)

PDF
module6_stringmatchingalgorithm_2022.pdf
PPT
PDF
StringMatching-Rabikarp algorithmddd.pdf
PDF
Pattern matching programs
PPTX
Gp 27[string matching].pptx
PPT
String-Matching Algorithms Advance algorithm
PPT
String kmp
PPT
Chap09alg
PPT
Chap09alg
PPTX
KMP algo
PPT
String matching algorithms
PPTX
String matching Algorithm by Foysal
PPT
Chpt9 patternmatching
PDF
Modified Rabin Karp
PPT
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
PPT
chap09alg.ppt for string matching algorithm
PDF
String matching algorithms
PPT
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
PPTX
Kmp & bm copy
module6_stringmatchingalgorithm_2022.pdf
StringMatching-Rabikarp algorithmddd.pdf
Pattern matching programs
Gp 27[string matching].pptx
String-Matching Algorithms Advance algorithm
String kmp
Chap09alg
Chap09alg
KMP algo
String matching algorithms
String matching Algorithm by Foysal
Chpt9 patternmatching
Modified Rabin Karp
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
chap09alg.ppt for string matching algorithm
String matching algorithms
Knutt Morris Pratt Algorithm by Dr. Rose.ppt
Kmp & bm copy
Ad

Recently uploaded (20)

PDF
Complications of Minimal Access Surgery at WLH
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Yogi Goddess Pres Conference Studio Updates
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Computing-Curriculum for Schools in Ghana
PDF
Weekly quiz Compilation Jan -July 25.pdf
Complications of Minimal Access Surgery at WLH
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Final Presentation General Medicine 03-08-2024.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
human mycosis Human fungal infections are called human mycosis..pptx
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Yogi Goddess Pres Conference Studio Updates
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Supply Chain Operations Speaking Notes -ICLT Program
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Orientation - ARALprogram of Deped to the Parents.pptx
Module 4: Burden of Disease Tutorial Slides S2 2025
Microbial disease of the cardiovascular and lymphatic systems
Computing-Curriculum for Schools in Ghana
Weekly quiz Compilation Jan -July 25.pdf
Ad

String-Matching algorithms KNuth-Morri-Pratt.pptx

  • 2. The problem of String Matching Given a string ‘S’, the problem of string matching deals with finding whether a pattern ‘p’ occurs in ‘S’ and if ‘p’ does occur then returning position in ‘S’ where ‘p’ occurs.
  • 3. …. a O(mn) approach One of the most obvious approach towards the string matching problem would be to compare the first element of the pattern to be searched ‘p’, with the first element of the string ‘S’ in which to locate ‘p’. If the first element of ‘p’ matches the first element of ‘S’, compare the second element of ‘p’ with second element of ‘S’. If match found proceed likewise until entire ‘p’ is found. If a mismatch is found at any position, shift ‘p’ one position to the right and repeat comparison beginning from first element of ‘p’.
  • 4. How does the O(mn) approach work Below is an illustration of how the previously described O(mn) approach works. String S a b c a b a a b c a b a c Pattern p a b a a
  • 5. Step 1:compare p[1] with S[1] S a b c a b a a b c a b a c p a b a a Step 2: compare p[2] with S[2] S a b c a b a a b c a b a c p a b a a
  • 6. Step 3: compare p[3] with S[3] S p a b a a Mismatch occurs here.. Since mismatch is detected, shift ‘p’ one position to the left and perform steps analogous to those from step 1 to step 3. At position where mismatch is detected, shift ‘p’ one position to the right and repeat matching procedure. a b c a b a a b c a b a c
  • 7. S a b c a b a a b c a b a c p a b a a Finally, a match would be found after shifting ‘p’ three times to the right side. Drawbacks of this approach: if ‘m’ is the length of pattern ‘p’ and ‘n’ the length of string ‘S’, the matching time is of the order O(mn). This is a certainly a very slow running algorithm. What makes this approach so slow is the fact that elements of ‘S’ with which comparisons had been performed earlier are involved again and again in comparisons in some future iterations. For example: when mismatch is detected for the first time in comparison of p[3] with S[3], pattern ‘p’ would be moved one position to the right and matching procedure would resume from here. Here the first comparison that would take place would be between p[0]=‘a’ and S[1]=‘b’. It should be noted here that S[1]=‘b’ had been previously involved in a comparison in step 2. this is a repetitive use of S[1] in another comparison. It is these repetitive comparisons that lead to the runtime of O(mn).
  • 8. The Knuth-Morris-Pratt Algorithm Knuth, Morris and Pratt proposed a linear time algorithm for the string matching problem. A matching time of O(n) is achieved by avoiding comparisons with elements of ‘S’ that have previously been involved in comparison with some element of the pattern ‘p’ to be matched. i.e., backtracking on the string ‘S’ never occurs
  • 9. Components of KMP algorithm  The prefix function, Π The prefix function,Π for a pattern encapsulates knowledge about how the pattern matches against shifts of itself. This information can be used to avoid useless shifts of the pattern ‘p’. In other words, this enables avoiding backtracking on the string ‘S’.  The KMP Matcher With string ‘S’, pattern ‘p’ and prefix function ‘Π’ as inputs, finds the occurrence of ‘p’ in ‘S’ and returns the number of shifts of ‘p’ after which occurrence is found.
  • 10. The prefix function, Π Following pseudocode computes the prefix fucnction, Π: Compute-Prefix-Function (p) 1 m  length[p] //’p’ pattern to be matched 2 Π[1]  0 3 k  0 4 for q  2 to m 5 do while k > 0 and p[k+1] != p[q] 6 do k  Π[k] 7 If p[k+1] = p[q] 8 then k  k +1 9 Π[q]  k 10 return Π
  • 11. Example: compute Π for the pattern ‘p’ below: p a b a b a c a Initially: m = length[p] = 7 Π[1] = 0 k = 0 Step 1: q = 2, k=0 Π[2] = 0 Step 2: q = 3, k = 0, Π[3] = 1 Step 3: q = 4, k = 1 Π[4] = 2 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 q 1 2 3 4 5 6 7 p a b a b a c A Π 0 0 1 2
  • 12. Step 4: q = 5, k =2 Π[5] = 3 Step 5: q = 6, k = 3 Π[6] = 1 Step 6: q = 7, k = 1 Π[7] = 1 After iterating 6 times, the prefix function computation is complete:  q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 1 q 1 2 3 4 5 6 7 p a b a b a c a Π 0 0 1 2 3 1 1 q 1 2 3 4 5 6 7 p a b A b a c a Π 0 0 1 2 3 1 1
  • 13. The KMP Matcher The KMP Matcher, with pattern ‘p’, string ‘S’ and prefix function ‘Π’ as input, finds a match of p in S. Following pseudocode computes the matching component of KMP algorithm: KMP-Matcher(S,p) 1 n  length[S] 2 m  length[p] 3 Π  Compute-Prefix-Function(p) 4 q  0 //number of characters matched 5 for i  1 to n //scan S from left to right 6 do while q > 0 and p[q+1] != S[i] 7 do q  Π[q] //next character does not match 8 if p[q+1] = S[i] 9 then q  q + 1 //next character matches 10 if q = m //is all of p matched? 11 then print “Pattern occurs with shift” i – m 12 q  Π[ q] // look for the next match Note: KMP finds every occurrence of a ‘p’ in ‘S’. That is why KMP does not terminate in step 12, rather it searches remainder of ‘S’ for any more occurrences of ‘p’.
  • 14. Illustration: given a String ‘S’ and pattern ‘p’ as follows: S b a c b a b a b a b a c a c a p a b a b a c a Let us execute the KMP algorithm to find whether ‘p’ occurs in ‘S’. For ‘p’ the prefix function, Π was computed previously and is as follows: q 1 2 3 4 5 6 7 p a b A b a c a Π 0 0 1 2 3 1 1
  • 15. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a Initially: n = size of S = 15; m = size of p = 7 Step 1: i = 1, q = 0 comparing p[1] with S[1] S p P[1] does not match with S[1]. ‘p’ will be shifted one position to the right. S p a b a b a c a Step 2: i = 2, q = 0 comparing p[1] with S[2] P[1] matches S[2]. Since there is a match, p is not shifted.
  • 16. Step 3: i = 3, q = 1 b a c b a b a b a b a c a a b Comparing p[2] with S[3] S a b a b a c a b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a p S p S p p[2] does not match with S[3] Backtracking on p, comparing p[1] and S[3] Step 4: i = 4, q = 0 comparing p[1] with S[4] p[1] does not match with S[4] Step 5: i = 5, q = 0 comparing p[1] with S[5] p[1] matches with S[5]
  • 17. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a a b a b a c a Step 6: i = 6, q = 1 S p Comparing p[2] with S[6] p[2] matches with S[6] S p Step 7: i = 7, q = 2 Comparing p[3] with S[7] p[3] matches with S[7] Step 8: i = 8, q = 3 Comparing p[4] with S[8] p[4] matches with S[8] S p
  • 18. Step 9: i = 9, q = 4 Comparing p[5] with S[9] Comparing p[6] with S[10] Comparing p[5] with S[11] Step 10: i = 10, q = 5 Step 11: i = 11, q = 4 S S S p p p b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a a b a b a c a p[6] doesn’t match with S[10] Backtracking on p, comparing p[4] with S[10] because after mismatch q = Π[5] = 3 p[5] matches with S[9] p[5] matches with S[11]
  • 19. b a c b a b a b a b a c a a b b a c b a b a b a b a c a a b a b a b a c a a b a b a c a Step 12: i = 12, q = 5 Comparing p[6] with S[12] Comparing p[7] with S[13] S S p p Step 13: i = 13, q = 6 p[6] matches with S[12] p[7] matches with S[13] Pattern ‘p’ has been found to completely occur in string ‘S’. The total number of shifts that took place for the match to be found are: i – m = 13 – 7 = 6 shifts.
  • 20. Running - time analysis  Compute-Prefix-Function (Π) 1 m  length[p] //’p’ pattern to be matched 2 Π[1]  0 3 k  0 4 for q  2 to m 5 do while k > 0 and p[k+1] != p[q] 6 do k  Π[k] 7 If p[k+1] = p[q] 8 then k  k +1 9 Π[q]  k 10 return Π In the above pseudocode for computing the prefix function, the for loop from step 4 to step 10 runs ‘m’ times. Step 1 to step 3 take constant time. Hence the running time of compute prefix function is Θ(m).  KMP Matcher 1 n  length[S] 2 m  length[p] 3 Π  Compute-Prefix-Function(p) 4 q  0 5 for i  1 to n 6 do while q > 0 and p[q+1] != S[i] 7 do q  Π[q] 8 if p[q+1] = S[i] 9 then q  q + 1 10 if q = m 11 then print “Pattern occurs with shift” i – m 12 q  Π[ q] The for loop beginning in step 5 runs ‘n’ times, i.e., as long as the length of the string ‘S’. Since step 1 to step 4 take constant time, the running time is dominated by this for loop. Thus running time of matching function is Θ(n).