SlideShare a Scribd company logo
String processing
algorithms
Strings
 Let Σ be an alphabet, e.g. Σ = ( , a, b, c, …, z)
 A string is any member of Σ*, i.e. any sequence of 0
or more members of Σ
 ‘this is a string’  Σ*
 ‘this is also a string’  Σ*
 ‘1234’  Σ*
String operations
 Given strings s1 of length n and s2 of length m
 Equality: is s1 = s2? (case sensitive or insensitive)
 Running time
 O(n) where n is length of shortest string
‘this is a string’ = ‘this is a string’
‘this is a string’ ≠ ‘this is another string’
‘this is a string’ =? ‘THIS IS A STRING’
String operations
 Concatenate (append): create string s1s2
 Running time
Θ(n+m)
‘this is a’ . ‘ string’ → ‘this is a string’
String operations
 Substitute: Exchange all occurrences of a particular
character with another character
 Running time
Θ(n)
Substitute(‘this is a string’, ‘i’, ‘x’) → ‘thxs xs a strxng’
Substitute(‘banana’, ‘a’, ‘o’) → ‘bonono’
String operations
Length(‘this is a string’) → 16
Length(‘this is another string’) → 24
String operations
Prefix(‘this is a string’, 4) → ‘this’
Suffix(‘this is a string’, 6) → ‘string’
String operations
Substring(‘this is a string’, 4, 8) → ‘s is ’
Edit distance EXAM MUST
(aka Levenshtein distance)
 Edit distance between two strings is the minimum number of
insertions, deletions and substitutions required to transform
string s1 into string s2
Insertion:
ABACED ABACCED DABACCED
Insert ‘C’ Insert ‘D’
Edit distance
(aka Levenshtein distance)
 Edit distance between two strings is the minimum
number of insertions, deletions and substitutions
required to transform string s1 into string s2
Deletion:
ABACED
Edit distance
(aka Levenshtein distance)
 Edit distance between two strings is the minimum
number of insertions, deletions and substitutions
required to transform string s1 into string s2
Deletion:
ABACED BACED
Delete ‘A’
Edit distance
(aka Levenshtein distance)
 Edit distance between two strings is the minimum
number of insertions, deletions and substitutions
required to transform string s1 into string s2
Deletion:
ABACED BACED BACE
Delete ‘A’ Delete ‘D’
Edit distance
(aka Levenshtein distance)
 Edit distance between two strings is the minimum
number of insertions, deletions and substitutions
required to transform string s1 into string s2
Substitution:
ABACED ABADED ABADES
Sub ‘D’ for ‘C’ Sub ‘S’ for ‘D’
Edit distance examples
Edit(Kitten, Mitten) = 1
Operations:
Sub ‘M’ for ‘K’ Mitten
Edit distance examples
Edit(Happy, Hilly) = 3
Operations:
Sub ‘a’ for ‘i’ H I ppy
Sub ‘l’ for ‘p’ Hi l py
Sub ‘l’ for ‘p’ Hil l y
Edit distance examples
Edit(Banana, Car) = 5
Operations:
Delete ‘B’ anana
Delete ‘a’ nana
Delete ‘n’ naa
Sub ‘C’ for ‘n’ Caa
Sub ‘a’ for ‘r’ Car
Edit distance examples
Edit(Simple, Apple) = 3 no of operation need
Operations:
Delete ‘S’ imple
Sub ‘A’ for ‘i’ A mple
Sub ‘m’ for ‘p’ A p ple
Is edit distance symmetric
(reversibale)?
 that is, is Edit(s1, s2) = Edit(s2, s1)?
 Why?
 sub ‘i’ for ‘j’ sub ‘j’ for ‘i’
→
 delete ‘i’ insert ‘i’
→
 insert ‘i’ delete ‘i’
→
Calculating edit distance
X = A B C B D A B
Y = B D C A B A
Ideas?
Calculating edit distance
X = A B C B D A ?
Y = B D C A B ?
After all of the operations, X needs
to equal Y
Calculating edit distance
X = A B C B D A ?
Y = B D C A B ?
Operations: Insert
Delete
Substitute
Insert
X = A B C B D A ?
Y = B D C A B ?
Insert
X = A B C B D A ?
Y = B D C A B ?
Edit
)
,
(
1
)
,
( 1
...
1
...
1 

 m
n Y
X
Edit
Y
X
Edit
Delete
X = A B C B D A ?
Y = B D C A B ?
Delete
X = A B C B D A ?
Y = B D C A B ?
)
,
(
1
)
,
( ...
1
1
...
1 m
n Y
X
Edit
Y
X
Edit 


Edit
Substition
X = A B C B D A ?
Y = B D C A B ?
Substition
X = A B C B D A ?
Y = B D C A B ?
Edit
)
,
(
1
)
,
( 1
...
1
1
...
1 


 m
n Y
X
Edit
Y
X
Edit
Anything else?
X = A B C B D A ?
Y = B D C A B ?
Equal
X = A B C B D A ?
Y = B D C A B ?
Equal
X = A B C B D A ?
Y = B D C A B ?
Edit
)
,
(
)
,
( 1
...
1
1
...
1 

 m
n Y
X
Edit
Y
X
Edit
Combining results
)
,
(
)
,
( 1
...
1
1
...
1 

 m
n Y
X
Edit
Y
X
Edit
)
,
(
1
)
,
( 1
...
1
1
...
1 


 m
n Y
X
Edit
Y
X
Edit
)
,
(
1
)
,
( ...
1
1
...
1 m
n Y
X
Edit
Y
X
Edit 


)
,
(
1
)
,
( 1
...
1
...
1 

 m
n Y
X
Edit
Y
X
Edit
Insert:
Delete:
Substitute:
Equal:
Rabin-Karp algorithm
P = ABA
S = BABABBABABA
- Use a function T to that computes a numerical
representation of P
,
- Calculate T for all m symbol sequences of S
and compare
P = ABA
S = BABABBABABA
Hash P
T(P)
Rabin-Karp algorithm
- Use a function T to that computes a numerical
representation of P
- Calculate T for all m symbol sequences of S and
compare
P = ABA
S = BABABBABABA
Hash m symbol
sequences and
compare
T(P)
Rabin-Karp algorithm
- Use a function T to that computes a numerical
representation of P
- Calculate T for all m symbol sequences of S and
compare
T(BAB)
=
P = ABA
S = BABABBABABA
Hash m symbol
sequences and
compare
T(P)
match
Rabin-Karp algorithm
- Use a function T to that computes a numerical
representation of P
- Calculate T for all m symbol sequences of S and
compare
T(ABA)
=
P = ABA
S = BABABBABABA
Hash m symbol
sequences and
compare
T(P)
Rabin-Karp algorithm
- Use a function T to that computes a numerical
representation of P
- Calculate T for all m symbol sequences of S and
compare
T(BAB)
=
P = ABA
S = BABABBABABA
Hash m symbol
sequences and
compare
T(P)
…
Rabin-Karp algorithm
- Use a function T to that computes a numerical
representation of P
- Calculate T for all m symbol sequences of S and
compare
T(BAB)
=
Rabin-Karp algorithm
 Given T(si…i+m-1) we must
be able to efficiently
calculate T(si+1…i+m)
P = ABA
S = BABABBABABA
For this to be
useful/efficient, what
needs to be true
about T?
T(P)
…
T(BAB)
=

More Related Content

PPT
Cgo2007 P3 3 Birkbeck
PPT
A Dimension Abstraction Approach to Vectorization in Matlab
PPT
Disjoint sets
PPT
Transforms UNIt 2
PDF
Lesson 2: A Catalog of Essential Functions (slides)
PDF
Lesson 2: A Catalog of Essential Functions (slides)
PPT
Tree distance algorithm
PPTX
Mining of massive datasets
Cgo2007 P3 3 Birkbeck
A Dimension Abstraction Approach to Vectorization in Matlab
Disjoint sets
Transforms UNIt 2
Lesson 2: A Catalog of Essential Functions (slides)
Lesson 2: A Catalog of Essential Functions (slides)
Tree distance algorithm
Mining of massive datasets

Similar to Strings matching in pattern recognition.ppt (20)

PPTX
Complex differentiation contains analytic function.pptx
PDF
Hive function-cheat-sheet
PPT
Matrix 2 d
PPTX
Genmath weeknnnnnnnnnnnnnnnnnnnnnnnnnnnn
PPT
Computer graphics
PPT
2 d transformation
PPT
Derivatie class 12
PDF
Programming in lua STRING AND ARRAY
PDF
Open GL 04 linealgos
PPTX
3.7 applications of tangent lines
PPT
Overlay Stitch Meshing
PDF
matlab functions
DOCX
15.) Line L in the figure below is parallel to the line y=.docx
PPT
logarithmic, exponential, trigonometric functions and their graphs.ppt
PPT
Matrix 2 d
PPTX
TCS_Digital_Advanced_Coding_Student copy.pptx
PDF
Ankit_Practical_File-1.pdf A detailed overview of Rizir as a brand
PPTX
2_EditDistance_Jan_08_2020.pptx
PPTX
Applied Algorithms and Structures week999
PPTX
Intro to Matlab programming
Complex differentiation contains analytic function.pptx
Hive function-cheat-sheet
Matrix 2 d
Genmath weeknnnnnnnnnnnnnnnnnnnnnnnnnnnn
Computer graphics
2 d transformation
Derivatie class 12
Programming in lua STRING AND ARRAY
Open GL 04 linealgos
3.7 applications of tangent lines
Overlay Stitch Meshing
matlab functions
15.) Line L in the figure below is parallel to the line y=.docx
logarithmic, exponential, trigonometric functions and their graphs.ppt
Matrix 2 d
TCS_Digital_Advanced_Coding_Student copy.pptx
Ankit_Practical_File-1.pdf A detailed overview of Rizir as a brand
2_EditDistance_Jan_08_2020.pptx
Applied Algorithms and Structures week999
Intro to Matlab programming
Ad

Recently uploaded (20)

PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPT
Project quality management in manufacturing
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Current and future trends in Computer Vision.pptx
PDF
PPT on Performance Review to get promotions
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Geodesy 1.pptx...............................................
PPTX
Construction Project Organization Group 2.pptx
PDF
composite construction of structures.pdf
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Lecture Notes Electrical Wiring System Components
PPT
Mechanical Engineering MATERIALS Selection
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Artificial Intelligence
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Project quality management in manufacturing
Automation-in-Manufacturing-Chapter-Introduction.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
CH1 Production IntroductoryConcepts.pptx
Current and future trends in Computer Vision.pptx
PPT on Performance Review to get promotions
OOP with Java - Java Introduction (Basics)
Geodesy 1.pptx...............................................
Construction Project Organization Group 2.pptx
composite construction of structures.pdf
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Lecture Notes Electrical Wiring System Components
Mechanical Engineering MATERIALS Selection
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Foundation to blockchain - A guide to Blockchain Tech
Artificial Intelligence
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Ad

Strings matching in pattern recognition.ppt

  • 2. Strings  Let Σ be an alphabet, e.g. Σ = ( , a, b, c, …, z)  A string is any member of Σ*, i.e. any sequence of 0 or more members of Σ  ‘this is a string’  Σ*  ‘this is also a string’  Σ*  ‘1234’  Σ*
  • 3. String operations  Given strings s1 of length n and s2 of length m  Equality: is s1 = s2? (case sensitive or insensitive)  Running time  O(n) where n is length of shortest string ‘this is a string’ = ‘this is a string’ ‘this is a string’ ≠ ‘this is another string’ ‘this is a string’ =? ‘THIS IS A STRING’
  • 4. String operations  Concatenate (append): create string s1s2  Running time Θ(n+m) ‘this is a’ . ‘ string’ → ‘this is a string’
  • 5. String operations  Substitute: Exchange all occurrences of a particular character with another character  Running time Θ(n) Substitute(‘this is a string’, ‘i’, ‘x’) → ‘thxs xs a strxng’ Substitute(‘banana’, ‘a’, ‘o’) → ‘bonono’
  • 6. String operations Length(‘this is a string’) → 16 Length(‘this is another string’) → 24
  • 7. String operations Prefix(‘this is a string’, 4) → ‘this’ Suffix(‘this is a string’, 6) → ‘string’
  • 8. String operations Substring(‘this is a string’, 4, 8) → ‘s is ’
  • 9. Edit distance EXAM MUST (aka Levenshtein distance)  Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2 Insertion: ABACED ABACCED DABACCED Insert ‘C’ Insert ‘D’
  • 10. Edit distance (aka Levenshtein distance)  Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2 Deletion: ABACED
  • 11. Edit distance (aka Levenshtein distance)  Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2 Deletion: ABACED BACED Delete ‘A’
  • 12. Edit distance (aka Levenshtein distance)  Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2 Deletion: ABACED BACED BACE Delete ‘A’ Delete ‘D’
  • 13. Edit distance (aka Levenshtein distance)  Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2 Substitution: ABACED ABADED ABADES Sub ‘D’ for ‘C’ Sub ‘S’ for ‘D’
  • 14. Edit distance examples Edit(Kitten, Mitten) = 1 Operations: Sub ‘M’ for ‘K’ Mitten
  • 15. Edit distance examples Edit(Happy, Hilly) = 3 Operations: Sub ‘a’ for ‘i’ H I ppy Sub ‘l’ for ‘p’ Hi l py Sub ‘l’ for ‘p’ Hil l y
  • 16. Edit distance examples Edit(Banana, Car) = 5 Operations: Delete ‘B’ anana Delete ‘a’ nana Delete ‘n’ naa Sub ‘C’ for ‘n’ Caa Sub ‘a’ for ‘r’ Car
  • 17. Edit distance examples Edit(Simple, Apple) = 3 no of operation need Operations: Delete ‘S’ imple Sub ‘A’ for ‘i’ A mple Sub ‘m’ for ‘p’ A p ple
  • 18. Is edit distance symmetric (reversibale)?  that is, is Edit(s1, s2) = Edit(s2, s1)?  Why?  sub ‘i’ for ‘j’ sub ‘j’ for ‘i’ →  delete ‘i’ insert ‘i’ →  insert ‘i’ delete ‘i’ →
  • 19. Calculating edit distance X = A B C B D A B Y = B D C A B A Ideas?
  • 20. Calculating edit distance X = A B C B D A ? Y = B D C A B ? After all of the operations, X needs to equal Y
  • 21. Calculating edit distance X = A B C B D A ? Y = B D C A B ? Operations: Insert Delete Substitute
  • 22. Insert X = A B C B D A ? Y = B D C A B ?
  • 23. Insert X = A B C B D A ? Y = B D C A B ? Edit ) , ( 1 ) , ( 1 ... 1 ... 1    m n Y X Edit Y X Edit
  • 24. Delete X = A B C B D A ? Y = B D C A B ?
  • 25. Delete X = A B C B D A ? Y = B D C A B ? ) , ( 1 ) , ( ... 1 1 ... 1 m n Y X Edit Y X Edit    Edit
  • 26. Substition X = A B C B D A ? Y = B D C A B ?
  • 27. Substition X = A B C B D A ? Y = B D C A B ? Edit ) , ( 1 ) , ( 1 ... 1 1 ... 1     m n Y X Edit Y X Edit
  • 28. Anything else? X = A B C B D A ? Y = B D C A B ?
  • 29. Equal X = A B C B D A ? Y = B D C A B ?
  • 30. Equal X = A B C B D A ? Y = B D C A B ? Edit ) , ( ) , ( 1 ... 1 1 ... 1    m n Y X Edit Y X Edit
  • 31. Combining results ) , ( ) , ( 1 ... 1 1 ... 1    m n Y X Edit Y X Edit ) , ( 1 ) , ( 1 ... 1 1 ... 1     m n Y X Edit Y X Edit ) , ( 1 ) , ( ... 1 1 ... 1 m n Y X Edit Y X Edit    ) , ( 1 ) , ( 1 ... 1 ... 1    m n Y X Edit Y X Edit Insert: Delete: Substitute: Equal:
  • 32. Rabin-Karp algorithm P = ABA S = BABABBABABA - Use a function T to that computes a numerical representation of P , - Calculate T for all m symbol sequences of S and compare
  • 33. P = ABA S = BABABBABABA Hash P T(P) Rabin-Karp algorithm - Use a function T to that computes a numerical representation of P - Calculate T for all m symbol sequences of S and compare
  • 34. P = ABA S = BABABBABABA Hash m symbol sequences and compare T(P) Rabin-Karp algorithm - Use a function T to that computes a numerical representation of P - Calculate T for all m symbol sequences of S and compare T(BAB) =
  • 35. P = ABA S = BABABBABABA Hash m symbol sequences and compare T(P) match Rabin-Karp algorithm - Use a function T to that computes a numerical representation of P - Calculate T for all m symbol sequences of S and compare T(ABA) =
  • 36. P = ABA S = BABABBABABA Hash m symbol sequences and compare T(P) Rabin-Karp algorithm - Use a function T to that computes a numerical representation of P - Calculate T for all m symbol sequences of S and compare T(BAB) =
  • 37. P = ABA S = BABABBABABA Hash m symbol sequences and compare T(P) … Rabin-Karp algorithm - Use a function T to that computes a numerical representation of P - Calculate T for all m symbol sequences of S and compare T(BAB) =
  • 38. Rabin-Karp algorithm  Given T(si…i+m-1) we must be able to efficiently calculate T(si+1…i+m) P = ABA S = BABABBABABA For this to be useful/efficient, what needs to be true about T? T(P) … T(BAB) =