SlideShare a Scribd company logo
Advanced Data Structure: Bioinformatics
•First week: Algorithms for exact string matching.
•Second week: Approximate search and alignment
of short sequences.
•Third week: Dealing with long sequences.
Advanced Data Structure:bibliography
•Bioinformatics, Sequence and Genome Analysis
David W. Mount
•Flexible Pattern Matching in Strings (2002)
Gonzalo Navarro and Mathieu Raffinot
•http://www-igm.univ-mlv.fr/~lecroq/string/index.html
•http://www.ncbi.nlm.nih.gov/
First week
•First week: algorithms for exact string matching:
One pattern: The algorithm depends on |p| and |
k patterns: The algorithm depends on k, |p| and ||
•Second week: approximate search and alignment
of short sequences.
•Third week: dealing with long sequences.
Exact string matching for one pattern
For instance, given the sequence
CTACTACTACGTCTATACTGATCGTAGCTACTACATGC
search for the pattern ACTGA.
How does the string algorithms made the search?
and for the pattern TACTACGGTATGACTAA
Exact string matching: Brute force algorithm
Given the pattern ATGTA, the search is
G T A C T A G A G G A C G T A T G T A C T G ...
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
Example:
Exact string matching: Brute force algorithm
Text :
Pattern :
From left to right: prefix
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text :
The window is shifted only one cell
Exact string matching: one pattern
There is a sliding window along the text
against which the pattern is compared:
How does the matching algorithms made the search?
Pattern :
Text :
Which are the facts that differentiate the algorithms?
1. How the comparison is made.
2. The length of the shift.
At each step the comparison is made and
the window is shifted to the right.
Exact string matching for one pattern
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256e
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDM
BOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Horspool algorithm
Text :
Pattern :
Sufix search
• Which is the next position of the window?
• How the comparison is made?
Pattern :
Text : a
Shift until the next ocurrence of “a” in the pattern:
a
a a
a a a
We need a preprocessing phase to construct the shift table.
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A
C
G
T
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A 4
C
G
T
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A 4
C 5
G
T
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A 4
C 5
G 2
T
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A 4
C 5
G 2
T 1
Horspool algorithm : example
Given the pattern ATGTA
• The shift table is:
A 4
C 5
G 2
T 1
• The searching phase: G T A C T A G A G G A C G T A T G T A C T G ...
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
Horspool algorithm: example
Given the pattern ATGTA
• The shift table is:
A 4
C 5
G 2
T 1
• The searching phase: G T A C T A G A G G A C G T A T G T A C T G ...
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
A T G T A
Some questions about Horspool algorithm
A 4
C 5
G 2
T 1
Given a random text over an equally likely
probability distribution (EPD):
Given the pattern ATGTA, the shift table is
1.- Determine the expected shift of the window. And,
if the PD is not equally likely?
2.- Determine the expected number of shifts
assuming a text of length n.
3.- Determine the expected number of comparisons
in the suffix search phase
Exact string matching for one pattern
Experimental efficiency (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDM
BOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Text :
Pattern :
Search for suffixes of T that are factors of
BNDM algorithm
• Which is the next position of the window ?
• How the comparison is made?
That is denoted as
D2 = 1 0 0 0 1 0 0
Depends on the value of the leftmost bit of D
Once the next character x is read
D3 = D2<<1 & B(x)
B(x): mask of x in the pattern P.
For instance, if B(x) = ( 0 0 1 1 0 0 0)
D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 )
x
BNDM algorithm: example
Given the pattern ATGTA
• The searching phase: G T A C T A G A G G A C G T A T G T A C T G ...
A T G T A
A T G T A
A T G T A
A T G T A
• The mask of characters is:
B(A) = ( 1 0 0 0 1 )
B(C) = ( 0 0 0 0 0 )
B(G) = ( 0 0 1 0 0 )
B(T) = ( 0 1 0 1 0 )
D1 = ( 0 1 0 1 0 )
D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 )
D1 = ( 0 0 1 0 0 )
D2 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 )
D1 = ( 1 0 0 0 1 )
D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 )
D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0) = ( 0 0 1 0 0 )
D4 = ( 0 1 0 0 0 ) & ( 0 0 0 0 0) = ( 0 0 0 0 0 )
BNDM algorithm: example of window shift
A T G T A
• Given the pattern ATGTA
• The mask of characters is :
• The searching phase: G T A C T A G A G G A C G T A T G T A C T G ...
A T G T A
B(A) = ( 1 0 0 0 1 )
B(C) = ( 0 0 0 0 0 )
B(G) = ( 0 0 1 0 0 )
B(T) = ( 0 1 0 1 0 )
D1 = ( 1 0 0 0 1 )
D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 )
D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 )
D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 )
D5 = ( 1 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 )
D6 = ( 0 0 0 0 0 ) & ( * * * * * ) = ( 0 0 0 0 0 ) Found
BNDM algorithm: example
Given the pattern ATGTA
• The searching phase: G T A C T A G AA T A C G T A T G T A C T G ...
A T G T A
A T G T A
A T G T A
• The mask of characters is :
B(A) = ( 1 0 0 0 1 )
B(C) = ( 0 0 0 0 0 )
B(G) = ( 0 0 1 0 0 )
B(T) = ( 0 1 0 1 0 )
D1 = ( 0 1 0 1 0 )
D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 )
D1 = ( 0 1 0 1 0 )
D2 = ( 1 0 1 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 )
D3 = ( 0 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 )
How the shif is determined?
Extended string matching
• Classes of characters: when in some DNA files or patterns there are
new characters as N or R that means N={A,C,G,T} and R={G,A}.
• Bounded length gaps: we find pattern as ATx(2,3)TA where x(2,3)
means any 2 or 3 characters.
• Optional characters: we find pattern as AC?ACT?T?A where C?
means that C may or may not appear in the text.
• Wild cards: we find pattern as AT*TA where * means an arbitrary long
string.
• Repeatable characters: we find pattern as AT[TA]*AT where [TA]*
means that TA can appear zero or more times..
Exact string matching for one pattern
Algorismes més eficients (Navarro & Raffinot)
2 4 8 16 32 64 128 256
64
32
16
8
4
2
| |
Long. pattern
Horspool
BNDM
BOM
BNDM : Backward Nondeterministic Dawg Matching
BOM : Backward Oracle Matching
w
Autòmata Factor Oracle: propietats
Factor Oracle of word G T A T G T A
G
G A
T T A
T
T
A
G
All states are accepting states.
Recognizes all factors … but more, which?
If a word is rejected, it isn't a factor, then
BOM algorithm (Backward Oracle Matching)
• How many cells are shifted?
• How the comparison is made?
Text :
Pattern : Automata: Factor Oracle
Checks from right to left
a
• If the a isn't into the automaton
• If we reach the last stat of the automaton with the a
a
BOM algorithm: example
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
A T G T A T G
How the comparison is made?
G
G A
T T A
T
T
A
G
BOM algorithm: example
A T G T A T G
How the comparison is made?
G
G A
T T A
T
T
A
G
A T G T A T G
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
BOM algorithm: example
A T G T A T G
How the comparison is made?
G
G A
T T A
T
T
A
G
A T G T A T G
A T G T A T G
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
BOM algorithm: example
A T G T A T G
How the comparison is made?
G
G A
T T A
T
T
A
G
A T G T A T G
A T G T A T G
A T G T A T G
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
BOM algorithm: example
A T G T A T G
G
G A
T T A
T
T
A
G
A T G T A T G
A T G T A T G
A T G T A T G
A T G T A T G
How the comparison is
made?
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
BOM algorithm: example
A T G T A T G
G
G A
T T A
T
T
A
G
A T G T A T G
A T G T A T G
A T G T A T G
A T G T A T G
A T G T A T G
How the comparison is
made?
• The automaton of the inverse patterns is built: given the pattern ATGTATG
• And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
Automata Factor Oracle
Given the pattern GTATA, in which state the factors are accepted?
G A
T
T
A
G GT
T
GTA
TA
A
When the new A is read, 5 factors
should be accepted GTATA
TATA
ATA
TA
A, how it can be
reached?
GTAT
TAT
AT
T
T
G A
T
T
A
G GT
T
GTA
TA
A
When the new T is read, 4 factors should be
accepted GTAT
TAT
AT
T, how it can be reached?
Automata Factor Oracle
When the new
G is read, 6
factors should
be accepted
GTATAG
TATAG
ATAG
TAG
AG
G
GTATA
TATA
ATA
TA
A
GTAT
TAT
AT
T
T
G A
T
T
A
G GT
T
GTA
TA
A
A G
GTATAG
TATAG
ATAG
TAG
AG
G
Automaton Factor Oracle: linear algorithm
?
Autòmata Factor Oracle: algorisme
If there is a T transition ...
T
T
Autòmata Factor Oracle: algorisme
… and recursively continue ...
T
T
But if there isn't a T transition ...

More Related Content

PDF
DBMS LECTURE NOTES FOR AKTU
PPTX
Regular expressions
PDF
Basic blocks and flow graph in Compiler Construction
PPTX
Strassen's matrix multiplication
PPTX
Turing machine
PPT
Goal stack planning.ppt
PPT
Branch and bound
PPTX
Lecture 14 Heuristic Search-A star algorithm
DBMS LECTURE NOTES FOR AKTU
Regular expressions
Basic blocks and flow graph in Compiler Construction
Strassen's matrix multiplication
Turing machine
Goal stack planning.ppt
Branch and bound
Lecture 14 Heuristic Search-A star algorithm

What's hot (20)

PPTX
Branch and bound technique
PPTX
Turing machine-TOC
PPT
DESIGN AND ANALYSIS OF ALGORITHMS
PPT
Chomsky Hierarchy.ppt
PDF
PPTX
Informed and Uninformed search Strategies
PDF
Sorting Algorithms
PDF
Algorithms Lecture 4: Sorting Algorithms I
PPTX
Gui programming
PPTX
Problem solving agents
PPTX
Turing machine - theory of computation
PPTX
Knowledge representation in AI
PPT
Intro to-iterative-deepening
PPTX
Uninformed Search technique
PDF
Introduction To Autumata Theory
DOCX
Trees and Graphs in data structures and Algorithms
PPTX
push down automata
PDF
Lab report for Prolog program in artificial intelligence.
PPT
Turing Machine
PDF
Data visualization in Python
Branch and bound technique
Turing machine-TOC
DESIGN AND ANALYSIS OF ALGORITHMS
Chomsky Hierarchy.ppt
Informed and Uninformed search Strategies
Sorting Algorithms
Algorithms Lecture 4: Sorting Algorithms I
Gui programming
Problem solving agents
Turing machine - theory of computation
Knowledge representation in AI
Intro to-iterative-deepening
Uninformed Search technique
Introduction To Autumata Theory
Trees and Graphs in data structures and Algorithms
push down automata
Lab report for Prolog program in artificial intelligence.
Turing Machine
Data visualization in Python
Ad

Similar to Horspool Algorithm in Design and Analysis of Algorithms in VTU (20)

PDF
Pattern matching programs
PDF
An Application of Pattern matching for Motif Identification
PPT
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
PDF
An Index Based K-Partitions Multiple Pattern Matching Algorithm
PDF
module6_stringmatchingalgorithm_2022.pdf
PPT
Chpt9 patternmatching
PPTX
Ip 5 discrete mathematics
PPTX
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
PPTX
Discrete Math IP4 - Automata Theory
PPTX
Daa unit 5
PPTX
Boyer more algorithm
PDF
StringMatching-Rabikarp algorithmddd.pdf
PPTX
String Matching Finite Automata & KMP Algorithm.
PPTX
Boyer more algorithm
PPT
Perform brute force
 
PPTX
2015 bioinformatics alignments_wim_vancriekinge
PPTX
2016 bioinformatics i_alignments_wim_vancriekinge
PPTX
String matching Algorithm by Foysal
PPTX
Combinatorial Algorithms String Matching.pptx
PDF
ADA complete notes
Pattern matching programs
An Application of Pattern matching for Motif Identification
PatternMatching2.pptnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
An Index Based K-Partitions Multiple Pattern Matching Algorithm
module6_stringmatchingalgorithm_2022.pdf
Chpt9 patternmatching
Ip 5 discrete mathematics
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
Discrete Math IP4 - Automata Theory
Daa unit 5
Boyer more algorithm
StringMatching-Rabikarp algorithmddd.pdf
String Matching Finite Automata & KMP Algorithm.
Boyer more algorithm
Perform brute force
 
2015 bioinformatics alignments_wim_vancriekinge
2016 bioinformatics i_alignments_wim_vancriekinge
String matching Algorithm by Foysal
Combinatorial Algorithms String Matching.pptx
ADA complete notes
Ad

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Artificial Intelligence
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
Safety Seminar civil to be ensured for safe working.
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Current and future trends in Computer Vision.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Lecture Notes Electrical Wiring System Components
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Geodesy 1.pptx...............................................
PPTX
web development for engineering and engineering
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
additive manufacturing of ss316l using mig welding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
573137875-Attendance-Management-System-original
UNIT 4 Total Quality Management .pptx
Artificial Intelligence
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
Safety Seminar civil to be ensured for safe working.
OOP with Java - Java Introduction (Basics)
Current and future trends in Computer Vision.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Lecture Notes Electrical Wiring System Components
R24 SURVEYING LAB MANUAL for civil enggi
Geodesy 1.pptx...............................................
web development for engineering and engineering
Foundation to blockchain - A guide to Blockchain Tech
UNIT-1 - COAL BASED THERMAL POWER PLANTS
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
additive manufacturing of ss316l using mig welding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original

Horspool Algorithm in Design and Analysis of Algorithms in VTU

  • 1. Advanced Data Structure: Bioinformatics •First week: Algorithms for exact string matching. •Second week: Approximate search and alignment of short sequences. •Third week: Dealing with long sequences.
  • 2. Advanced Data Structure:bibliography •Bioinformatics, Sequence and Genome Analysis David W. Mount •Flexible Pattern Matching in Strings (2002) Gonzalo Navarro and Mathieu Raffinot •http://www-igm.univ-mlv.fr/~lecroq/string/index.html •http://www.ncbi.nlm.nih.gov/
  • 3. First week •First week: algorithms for exact string matching: One pattern: The algorithm depends on |p| and | k patterns: The algorithm depends on k, |p| and || •Second week: approximate search and alignment of short sequences. •Third week: dealing with long sequences.
  • 4. Exact string matching for one pattern For instance, given the sequence CTACTACTACGTCTATACTGATCGTAGCTACTACATGC search for the pattern ACTGA. How does the string algorithms made the search? and for the pattern TACTACGGTATGACTAA
  • 5. Exact string matching: Brute force algorithm Given the pattern ATGTA, the search is G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A Example:
  • 6. Exact string matching: Brute force algorithm Text : Pattern : From left to right: prefix • Which is the next position of the window? • How the comparison is made? Pattern : Text : The window is shifted only one cell
  • 7. Exact string matching: one pattern There is a sliding window along the text against which the pattern is compared: How does the matching algorithms made the search? Pattern : Text : Which are the facts that differentiate the algorithms? 1. How the comparison is made. 2. The length of the shift. At each step the comparison is made and the window is shifted to the right.
  • 8. Exact string matching for one pattern Experimental efficiency (Navarro & Raffinot) 2 4 8 16 32 64 128 256e 64 32 16 8 4 2 | | Long. pattern Horspool BNDM BOM BNDM : Backward Nondeterministic Dawg Matching BOM : Backward Oracle Matching w
  • 9. Horspool algorithm Text : Pattern : Sufix search • Which is the next position of the window? • How the comparison is made? Pattern : Text : a Shift until the next ocurrence of “a” in the pattern: a a a a a a We need a preprocessing phase to construct the shift table.
  • 10. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A C G T
  • 11. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A 4 C G T
  • 12. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A 4 C 5 G T
  • 13. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A 4 C 5 G 2 T
  • 14. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A 4 C 5 G 2 T 1
  • 15. Horspool algorithm : example Given the pattern ATGTA • The shift table is: A 4 C 5 G 2 T 1 • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A
  • 16. Horspool algorithm: example Given the pattern ATGTA • The shift table is: A 4 C 5 G 2 T 1 • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A A T G T A
  • 17. Some questions about Horspool algorithm A 4 C 5 G 2 T 1 Given a random text over an equally likely probability distribution (EPD): Given the pattern ATGTA, the shift table is 1.- Determine the expected shift of the window. And, if the PD is not equally likely? 2.- Determine the expected number of shifts assuming a text of length n. 3.- Determine the expected number of comparisons in the suffix search phase
  • 18. Exact string matching for one pattern Experimental efficiency (Navarro & Raffinot) 2 4 8 16 32 64 128 256 64 32 16 8 4 2 | | Long. pattern Horspool BNDM BOM BNDM : Backward Nondeterministic Dawg Matching BOM : Backward Oracle Matching w
  • 19. Text : Pattern : Search for suffixes of T that are factors of BNDM algorithm • Which is the next position of the window ? • How the comparison is made? That is denoted as D2 = 1 0 0 0 1 0 0 Depends on the value of the leftmost bit of D Once the next character x is read D3 = D2<<1 & B(x) B(x): mask of x in the pattern P. For instance, if B(x) = ( 0 0 1 1 0 0 0) D = (0 0 0 1 0 0 0) & (0 0 1 1 0 0 0 ) = (0 0 0 1 0 0 0 ) x
  • 20. BNDM algorithm: example Given the pattern ATGTA • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A A T G T A A T G T A A T G T A • The mask of characters is: B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 0 0 1 0 0 ) D2 = ( 0 1 0 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 1 0 0 0 1 ) D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 0 0 0 0) = ( 0 0 0 0 0 )
  • 21. BNDM algorithm: example of window shift A T G T A • Given the pattern ATGTA • The mask of characters is : • The searching phase: G T A C T A G A G G A C G T A T G T A C T G ... A T G T A B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) D1 = ( 1 0 0 0 1 ) D2 = ( 0 0 0 1 0 ) & ( 0 1 0 1 0 ) = ( 0 0 0 1 0 ) D3 = ( 0 0 1 0 0 ) & ( 0 0 1 0 0 ) = ( 0 0 1 0 0 ) D4 = ( 0 1 0 0 0 ) & ( 0 1 0 1 0 ) = ( 0 1 0 0 0 ) D5 = ( 1 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) D6 = ( 0 0 0 0 0 ) & ( * * * * * ) = ( 0 0 0 0 0 ) Found
  • 22. BNDM algorithm: example Given the pattern ATGTA • The searching phase: G T A C T A G AA T A C G T A T G T A C T G ... A T G T A A T G T A A T G T A • The mask of characters is : B(A) = ( 1 0 0 0 1 ) B(C) = ( 0 0 0 0 0 ) B(G) = ( 0 0 1 0 0 ) B(T) = ( 0 1 0 1 0 ) D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 0 0 0 0 0 ) = ( 0 0 0 0 0 ) D1 = ( 0 1 0 1 0 ) D2 = ( 1 0 1 0 0 ) & ( 1 0 0 0 1 ) = ( 1 0 0 0 0 ) D3 = ( 0 0 0 0 0 ) & ( 1 0 0 0 1 ) = ( 0 0 0 0 0 ) How the shif is determined?
  • 23. Extended string matching • Classes of characters: when in some DNA files or patterns there are new characters as N or R that means N={A,C,G,T} and R={G,A}. • Bounded length gaps: we find pattern as ATx(2,3)TA where x(2,3) means any 2 or 3 characters. • Optional characters: we find pattern as AC?ACT?T?A where C? means that C may or may not appear in the text. • Wild cards: we find pattern as AT*TA where * means an arbitrary long string. • Repeatable characters: we find pattern as AT[TA]*AT where [TA]* means that TA can appear zero or more times..
  • 24. Exact string matching for one pattern Algorismes més eficients (Navarro & Raffinot) 2 4 8 16 32 64 128 256 64 32 16 8 4 2 | | Long. pattern Horspool BNDM BOM BNDM : Backward Nondeterministic Dawg Matching BOM : Backward Oracle Matching w
  • 25. Autòmata Factor Oracle: propietats Factor Oracle of word G T A T G T A G G A T T A T T A G All states are accepting states. Recognizes all factors … but more, which? If a word is rejected, it isn't a factor, then
  • 26. BOM algorithm (Backward Oracle Matching) • How many cells are shifted? • How the comparison is made? Text : Pattern : Automata: Factor Oracle Checks from right to left a • If the a isn't into the automaton • If we reach the last stat of the automaton with the a a
  • 27. BOM algorithm: example • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A... A T G T A T G How the comparison is made? G G A T T A T T A G
  • 28. BOM algorithm: example A T G T A T G How the comparison is made? G G A T T A T T A G A T G T A T G • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
  • 29. BOM algorithm: example A T G T A T G How the comparison is made? G G A T T A T T A G A T G T A T G A T G T A T G • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
  • 30. BOM algorithm: example A T G T A T G How the comparison is made? G G A T T A T T A G A T G T A T G A T G T A T G A T G T A T G • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
  • 31. BOM algorithm: example A T G T A T G G G A T T A T T A G A T G T A T G A T G T A T G A T G T A T G A T G T A T G How the comparison is made? • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
  • 32. BOM algorithm: example A T G T A T G G G A T T A T T A G A T G T A T G A T G T A T G A T G T A T G A T G T A T G A T G T A T G How the comparison is made? • The automaton of the inverse patterns is built: given the pattern ATGTATG • And the search is : G T A C T A G AA T G T G T A G A C A T G T A T G G G A...
  • 33. Automata Factor Oracle Given the pattern GTATA, in which state the factors are accepted? G A T T A G GT T GTA TA A When the new A is read, 5 factors should be accepted GTATA TATA ATA TA A, how it can be reached? GTAT TAT AT T T G A T T A G GT T GTA TA A When the new T is read, 4 factors should be accepted GTAT TAT AT T, how it can be reached?
  • 34. Automata Factor Oracle When the new G is read, 6 factors should be accepted GTATAG TATAG ATAG TAG AG G GTATA TATA ATA TA A GTAT TAT AT T T G A T T A G GT T GTA TA A A G GTATAG TATAG ATAG TAG AG G
  • 35. Automaton Factor Oracle: linear algorithm ?
  • 36. Autòmata Factor Oracle: algorisme If there is a T transition ... T T
  • 37. Autòmata Factor Oracle: algorisme … and recursively continue ... T T But if there isn't a T transition ...

Editor's Notes