SlideShare a Scribd company logo
Dynamic Programming:
basics and case studies
Houston Machine Learning Meetup
11/16/2019
Dynamic Programming: name and story
• Richard Bellman coined the term “Dynamic Programming”
Bellman autobiography
“The face of Wilson (the secretory of defense) would turn red, and he would get
violent if people used the term RESEARCH in his presence. You can imagine how he
felt, then, about the term MATHEMATICAL …. I had to do something to shield Wilson
and the Air Force from the fact that I was really doing MATHEMATICS inside the
RAND Corporation…. I decided therefore to use the word “PROGRAMMING". I
wanted to get across the idea that this was DYNAMIC, this was multistage, this was
time-varying…. I thought dynamic programming was a good name. It was something
not even a Congressman could object to..."
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by recursion
public int fib(int N) {
if (n == 0 || n == 1) { return n; }
return fib(N – 1) + fib(N – 2);
}
Time complexity: O(N) = 2^N
Recursion tree of Fibonacci sequence
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1 1
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2 3
Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2 3 5
Fibonacci sequence
• Recursion:
• F(n) = F(n – 1) + F(n – 2)
• Starts from n
• When computing F(n), F(n-1) and F(n-2) is not known yet
• DP:
• F(n) = F(n – 1) + F(n – 2)
• Starts from 0 and 1
• When computing F(n), F(n-1) and F(n-2) has been stored in array
• Dynamic programming: partial result stored to save time
Longest common subsequence
• To find the longest subsequence common to two or more sequences
• String1: “AGCAT”
• String2: “GAC”
• Common subsequence: “A”, “C”, “G”, “AC”, “GA”,
• LCS: “AC”, or “GA”
• To use a table to find LCS:
• First column: string1(“AGCAT”)
• First row: string2(“GAC”)
• Table[i, j]: LCS of string1.substring(0, i) and string2.substring(0, j)
Longest common subsequence
Longest common subsequence
Longest common subsequence
Longest common subsequence
Wildcard matching
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]
- * a * b
- T T F F F
a
d
c
a
b
Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T
b
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]j + 1]
Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T T
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
Longest common subsequence and wildcard
matching
• DP starts from initial condition to the end of string:
• From left to right at each row
• From top to bottom at each cloumn
• State transition from table[i - 1][j - 1], table[i][j - 1], table[i - 1][j] to
table[i][j]
• Each time: move forward by one step
• State at each is the global optimum of that step
• Table (or diagram) is the best tool to simulate the processing
Matrix chain multiplication
• Multiple two matrices: A(10 x 100) and B(100 x 5)
• OUT[p][r] += A[p][q] * B[q][r]
• Computation = 10 x 100 x 5
• Multiple three matrices: A1(10 x 100), A2(100 X 5), and A3(5 x 50)
• ((A1 A2) A3) : 10 x 100 x 5 (A1 A2) + 10 x 5 x 50 = 7500
• (A1 (A2 A3)) : 100 x 5 x 50 (A2 A3) + 10 x 100 x 50 = 75000
• ((A1 A2) A3) is 10 times faster than (A1 (A2 A3)) in regarding to scalar
computation
Matrix chain multiplication
• How to optimize the chain multiplication of matrices ( A1, A2, A3, ….
An)
• DP induction rule:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
• Status:
• M[i, j]: the min number of computations for the matrices (i to j) multiplication
• S[i, j]: the last-layer break-point for M[i, j]
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
Matrix chain multiplication: DP solution
• Six matrices multiplication:
(A1 (A2 A3)) ((A4 A5) A6)
Matrix chain multiplication: DP solution
• State hard to define:
• M[i, j]
• S[i, j]
• State transition complicated:
• By row and column not work
• From previous state to current state by the matrices length (Induction rule)
Framework of dynamic programming
• Three key components of dynamic programming algorithm:
• Definition of state
• Initial condition (base)
• Induction rule (state transition)
• Induction rule: difficult to find
• 1D/2D table for the thinking process
What is part of speech tagging?
• Identify parts of the speech (syntactic categories):
This is a simple sentence
DET VB DET ADJ NOUN
• POS tagging is a first step towards syntactic analysis (sematic analysis)
• Faster than full parsing
• Text classification and word disambiguation
• How to decide the correct label:
• Word to be labeled: chair is probably a noun
• Labels of surrounding word: if preceding word is a modal verb (.e.g., will) then this
word is more likely to be a verb
• Hidden Markov models can be used to work on this problem
Why is POS tagging hard?
• Ambiguity
glass of water/NOUN vs. water/VERB the plants
lie/VERB down vs. tell a lie/NOUN
wind/VERB down vs. a mighty wind/NOUN(homographs)
How about time flies like an arrow?
• Sparse data:
• Words we haven’t seen before
• Word-Tag pairs we haven’t seen before
Example transition probabilities
• Probabilities estimated from tagged WSJ corpus:
• Proper nouns (NNP) often begin sentences:P(NNP|<s>) = 0.28
• Modal verbs (MD) nearly always followed by bare verbs (VB).
• Adjectives (JJ) are often followed by nouns (NN).
Example output probabilities
• Probabilities estimated from tagged WSJ corpus:
• 0.0032% of proper nouns are Janet: P(Janet|NNP) = 0.000032
• About half of determiners (DT) are the.
• the can also be a proper noun.
Hidden Markov Chain
• A set of states (tags)
• An output alphabet (words)
• Initial state (beginning of sentence)
• State transition probabilities ( P(ti|ti-1) )
• Symbol emission probabilities ( P(wi|ti) )
Hidden Markov Chain
• Model the tagging process:
• Sentence: W = (w1, w2, … wn)
• Tags T = (t1, t2, …, tn)
• Joint probability: P(W, T) = ς𝑖=1
𝑛
𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛)
• Example:
• This/DET is/VB a/DET simple/JJ sentence/NN
• Add begin(<s>) and end-of-sentence (</s>):
P(W, T) = ς𝑖=1
𝑛
𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛)
= P(DET|<s>) P(VB/DET) P(DET/VB) P(JJ/DET) P(NN/JJ)
P(</s>|NN) x P(This|DET) P(is|VB) P(a|DET) P(simple|JJ)
P(sentence|NN)
Computation estimation of POS
• Suppose we have C possible tags for each of the n words in the
sentence
• There are C^n possible tag sequences: the number grows
exponentially in the length n
• Viterbi algorithm: use dynamic programming to solve it
Viterbi algorithm:
• Target: argmaxT P(T|W)
• Intuition: best path of length (i) at state of t must include best path of
length (i-1) to the previous state
• Use a table to store the partial result:
• TXN table, v(t, i) is the prob of best state sequence for w1 … wi ending at
state i
• Fill in columns from left to right, the max is over each possible previous t’
V(t, i) = max { v (t’, i – 1) P(t|t’) P(wi|ti) }
Viterbi algorithm: case study
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: case study
• W = the doctor is in.
Viterbi algorithm: all tagged
Dynamic programming: take-home message
• Why fast: use memory to store partial result
• DP algorithm component: state definition, initial condition, and
induction rule
• Solve DP problem with a table
Top ten DP problems
• Longest common subsequence
• Shortest common subsequence
• Longest increasing subsequence
• Edit distance
• Matrix chain multiplication
• 0-1 knapsack problem
• Partition problem
• Rod cutting
• Coin change problem
• Word break problem
Reference
• http://people.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_s
lides.pdf
• https://en.wikipedia.org/wiki/Dynamic_programming
• https://medium.com/@codingfreak/top-10-dynamic-programming-
problems-5da486eeb360
• https://leetcode.com/problems/wildcard-matching/description/
• https://en.wikipedia.org/wiki/Longest_common_subsequence_probl
em

More Related Content

PPTX
58 slopes of lines
PPTX
Fuzzy sets
PPTX
Math presentation on domain and range
PPT
Knowledge engg using & in fol
PPT
lecture 17
PDF
Category Theory made easy with (ugly) pictures
PDF
Lec 03 - Combinational Logic Design
PPT
functions limits and continuity
58 slopes of lines
Fuzzy sets
Math presentation on domain and range
Knowledge engg using & in fol
lecture 17
Category Theory made easy with (ugly) pictures
Lec 03 - Combinational Logic Design
functions limits and continuity

What's hot (20)

PPT
20 the chain rule
PPT
19 min max-saddle-points
PPTX
Your data structures are made of maths!
PDF
Fosdem 2013 petra selmer flexible querying of graph data
PPTX
1.6 slopes and the difference quotient
PPSX
Introduction to Function, Domain and Range - Mohd Noor
PPT
Relations and Functions
PPT
Chapter3 Search
PPT
23 general double integrals
PPT
22 double integrals
PPTX
t5 graphs of trig functions and inverse trig functions
PPTX
Metric space
PPTX
52 rational expressions
PPTX
Relations and functions
PPT
Module 1 Lesson 1 Remediation Notes
DOCX
Limits and continuity[1]
PPT
Relations and functions
PDF
Higher order derivatives for N -body simulations
PPTX
3.2 properties of division and roots
PPTX
2.4 defintion of derivative
20 the chain rule
19 min max-saddle-points
Your data structures are made of maths!
Fosdem 2013 petra selmer flexible querying of graph data
1.6 slopes and the difference quotient
Introduction to Function, Domain and Range - Mohd Noor
Relations and Functions
Chapter3 Search
23 general double integrals
22 double integrals
t5 graphs of trig functions and inverse trig functions
Metric space
52 rational expressions
Relations and functions
Module 1 Lesson 1 Remediation Notes
Limits and continuity[1]
Relations and functions
Higher order derivatives for N -body simulations
3.2 properties of division and roots
2.4 defintion of derivative
Ad

Similar to Basics of Dynamic programming (20)

PPT
Tree distance algorithm
PDF
Ch01 basic concepts_nosoluiton
PDF
Project presentation PPT.pdf this is help for student who doing this complier...
PPT
458237.-Compiler-Design-Intermediate-code-generation.ppt
PPT
lecture3.pptlecture3 data structures pptt
PDF
lec 03wweweweweweweeweweweewewewewee.pdf
PPTX
Intoduction to Computer Appl 1st_coa.pptx
PDF
time_complexity_list_02_04_2024_22_pages.pdf
PPTX
introduction to data structures and types
PPT
Laplace_1.ppt
PPTX
Introduction to matlab
PDF
An overview of Python 2.7
PDF
A tour of Python
PDF
Applied machine learning for search engine relevance 3
PPT
Profiling and optimization
PDF
CDT 22 slides.pdf
PDF
Basic arithmetic, instruction execution and program
Tree distance algorithm
Ch01 basic concepts_nosoluiton
Project presentation PPT.pdf this is help for student who doing this complier...
458237.-Compiler-Design-Intermediate-code-generation.ppt
lecture3.pptlecture3 data structures pptt
lec 03wweweweweweweeweweweewewewewee.pdf
Intoduction to Computer Appl 1st_coa.pptx
time_complexity_list_02_04_2024_22_pages.pdf
introduction to data structures and types
Laplace_1.ppt
Introduction to matlab
An overview of Python 2.7
A tour of Python
Applied machine learning for search engine relevance 3
Profiling and optimization
CDT 22 slides.pdf
Basic arithmetic, instruction execution and program
Ad

More from Yan Xu (20)

PPTX
Kaggle winning solutions: Retail Sales Forecasting
PPTX
Walking through Tensorflow 2.0
PPTX
Practical contextual bandits for business
PDF
Introduction to Multi-armed Bandits
PDF
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
PDF
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
PDF
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
PDF
Introduction to Autoencoders
PPTX
State of enterprise data science
PDF
Long Short Term Memory
PDF
Deep Feed Forward Neural Networks and Regularization
PPTX
Linear algebra and probability (Deep Learning chapter 2&3)
PPTX
HML: Historical View and Trends of Deep Learning
PDF
Secrets behind AlphaGo
PPTX
Optimization in Deep Learning
PDF
Introduction to Recurrent Neural Network
PDF
Convolutional neural network
PDF
Introduction to Neural Network
PDF
Nonlinear dimension reduction
Kaggle winning solutions: Retail Sales Forecasting
Walking through Tensorflow 2.0
Practical contextual bandits for business
Introduction to Multi-armed Bandits
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Introduction to Autoencoders
State of enterprise data science
Long Short Term Memory
Deep Feed Forward Neural Networks and Regularization
Linear algebra and probability (Deep Learning chapter 2&3)
HML: Historical View and Trends of Deep Learning
Secrets behind AlphaGo
Optimization in Deep Learning
Introduction to Recurrent Neural Network
Convolutional neural network
Introduction to Neural Network
Nonlinear dimension reduction

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Electronic commerce courselecture one. Pdf
PDF
Approach and Philosophy of On baking technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
Digital-Transformation-Roadmap-for-Companies.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Electronic commerce courselecture one. Pdf
Approach and Philosophy of On baking technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Dropbox Q2 2025 Financial Results & Investor Presentation
Spectral efficient network and resource selection model in 5G networks
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction

Basics of Dynamic programming

  • 1. Dynamic Programming: basics and case studies Houston Machine Learning Meetup 11/16/2019
  • 2. Dynamic Programming: name and story • Richard Bellman coined the term “Dynamic Programming” Bellman autobiography “The face of Wilson (the secretory of defense) would turn red, and he would get violent if people used the term RESEARCH in his presence. You can imagine how he felt, then, about the term MATHEMATICAL …. I had to do something to shield Wilson and the Air Force from the fact that I was really doing MATHEMATICS inside the RAND Corporation…. I decided therefore to use the word “PROGRAMMING". I wanted to get across the idea that this was DYNAMIC, this was multistage, this was time-varying…. I thought dynamic programming was a good name. It was something not even a Congressman could object to..."
  • 3. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2)
  • 4. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by recursion public int fib(int N) { if (n == 0 || n == 1) { return n; } return fib(N – 1) + fib(N – 2); } Time complexity: O(N) = 2^N Recursion tree of Fibonacci sequence
  • 5. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by DP Time complexity: O(N) = N Index 0 1 2 3 4 5 ….. F(N) 0 1
  • 6. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by DP Time complexity: O(N) = N Index 0 1 2 3 4 5 ….. F(N) 0 1 1
  • 7. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by DP Time complexity: O(N) = N Index 0 1 2 3 4 5 ….. F(N) 0 1 1 2
  • 8. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by DP Time complexity: O(N) = N Index 0 1 2 3 4 5 ….. F(N) 0 1 1 2 3
  • 9. Fibonacci sequence • Definition: • F(0) = 0 • F(1) = 1 • F(n) = F(n – 1) + F(n – 2) • Solved by DP Time complexity: O(N) = N Index 0 1 2 3 4 5 ….. F(N) 0 1 1 2 3 5
  • 10. Fibonacci sequence • Recursion: • F(n) = F(n – 1) + F(n – 2) • Starts from n • When computing F(n), F(n-1) and F(n-2) is not known yet • DP: • F(n) = F(n – 1) + F(n – 2) • Starts from 0 and 1 • When computing F(n), F(n-1) and F(n-2) has been stored in array • Dynamic programming: partial result stored to save time
  • 11. Longest common subsequence • To find the longest subsequence common to two or more sequences • String1: “AGCAT” • String2: “GAC” • Common subsequence: “A”, “C”, “G”, “AC”, “GA”, • LCS: “AC”, or “GA” • To use a table to find LCS: • First column: string1(“AGCAT”) • First row: string2(“GAC”) • Table[i, j]: LCS of string1.substring(0, i) and string2.substring(0, j)
  • 16. Wildcard matching • Linux command-line: user@bash: ls b* barry.txt, blan.txt bob.txt • Complicated example: string = "adcab“ pattern = “*a*b“ • DP solution: • Definition: table[i][j] • Base case: table[0][0] = true first row: table[0][i + 1] = table[0][i] (pattern[i]=*) • Induction rule: (1) if string[i] equals pattern[j] or pattern[j] equals ? table[i + ][j + 1] = table[i][j] (2) if (pattern[j] equals * table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] - * a * b - T T F F F a d c a b
  • 17. Wildcard matching - * a * b - T T F F F a F T T T F d F T F T F c F T F T F a F T T b • Linux command-line: user@bash: ls b* barry.txt, blan.txt bob.txt • Complicated example: string = "adcab“ pattern = “*a*b“ • DP solution: • Definition: table[i][j] • Base case: table[0][0] = true first row: table[0][i + 1] = table[0][i] (pattern[i]=*) • Induction rule: (1) if string[i] equals pattern[j] or pattern[j] equals ? table[i + ][j + 1] = table[i][j] (2) if (pattern[j] equals * table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]j + 1]
  • 18. Wildcard matching - * a * b - T T F F F a F T T T F d F T F T F c F T F T F a F T T T F b F T F T • Linux command-line: user@bash: ls b* barry.txt, blan.txt bob.txt • Complicated example: string = "adcab“ pattern = “*a*b“ • DP solution: • Definition: table[i][j] • Base case: table[0][0] = true first row: table[0][i + 1] = table[0][i] (pattern[i]=*) • Induction rule: (1) if string[i] equals pattern[j] or pattern[j] equals ? table[i + ][j + 1] = table[i][j] (2) if (pattern[j] equals * table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
  • 19. Wildcard matching - * a * b - T T F F F a F T T T F d F T F T F c F T F T F a F T T T F b F T F T T • Linux command-line: user@bash: ls b* barry.txt, blan.txt bob.txt • Complicated example: string = "adcab“ pattern = “*a*b“ • DP solution: • Definition: table[i][j] • Base case: table[0][0] = true first row: table[0][i + 1] = table[0][i] (pattern[i]=*) • Induction rule: (1) if string[i] equals pattern[j] or pattern[j] equals ? table[i + ][j + 1] = table[i][j] (2) if (pattern[j] equals * table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]
  • 20. Longest common subsequence and wildcard matching • DP starts from initial condition to the end of string: • From left to right at each row • From top to bottom at each cloumn • State transition from table[i - 1][j - 1], table[i][j - 1], table[i - 1][j] to table[i][j] • Each time: move forward by one step • State at each is the global optimum of that step • Table (or diagram) is the best tool to simulate the processing
  • 21. Matrix chain multiplication • Multiple two matrices: A(10 x 100) and B(100 x 5) • OUT[p][r] += A[p][q] * B[q][r] • Computation = 10 x 100 x 5 • Multiple three matrices: A1(10 x 100), A2(100 X 5), and A3(5 x 50) • ((A1 A2) A3) : 10 x 100 x 5 (A1 A2) + 10 x 5 x 50 = 7500 • (A1 (A2 A3)) : 100 x 5 x 50 (A2 A3) + 10 x 100 x 50 = 75000 • ((A1 A2) A3) is 10 times faster than (A1 (A2 A3)) in regarding to scalar computation
  • 22. Matrix chain multiplication • How to optimize the chain multiplication of matrices ( A1, A2, A3, …. An) • DP induction rule:
  • 23. Matrix chain multiplication: DP solution • Six matrices multiplication: • Status: • M[i, j]: the min number of computations for the matrices (i to j) multiplication • S[i, j]: the last-layer break-point for M[i, j]
  • 24. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 25. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 26. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 27. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 28. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 29. Matrix chain multiplication: DP solution • Six matrices multiplication:
  • 30. Matrix chain multiplication: DP solution • Six matrices multiplication: (A1 (A2 A3)) ((A4 A5) A6)
  • 31. Matrix chain multiplication: DP solution • State hard to define: • M[i, j] • S[i, j] • State transition complicated: • By row and column not work • From previous state to current state by the matrices length (Induction rule)
  • 32. Framework of dynamic programming • Three key components of dynamic programming algorithm: • Definition of state • Initial condition (base) • Induction rule (state transition) • Induction rule: difficult to find • 1D/2D table for the thinking process
  • 33. What is part of speech tagging? • Identify parts of the speech (syntactic categories): This is a simple sentence DET VB DET ADJ NOUN • POS tagging is a first step towards syntactic analysis (sematic analysis) • Faster than full parsing • Text classification and word disambiguation • How to decide the correct label: • Word to be labeled: chair is probably a noun • Labels of surrounding word: if preceding word is a modal verb (.e.g., will) then this word is more likely to be a verb • Hidden Markov models can be used to work on this problem
  • 34. Why is POS tagging hard? • Ambiguity glass of water/NOUN vs. water/VERB the plants lie/VERB down vs. tell a lie/NOUN wind/VERB down vs. a mighty wind/NOUN(homographs) How about time flies like an arrow? • Sparse data: • Words we haven’t seen before • Word-Tag pairs we haven’t seen before
  • 35. Example transition probabilities • Probabilities estimated from tagged WSJ corpus: • Proper nouns (NNP) often begin sentences:P(NNP|<s>) = 0.28 • Modal verbs (MD) nearly always followed by bare verbs (VB). • Adjectives (JJ) are often followed by nouns (NN).
  • 36. Example output probabilities • Probabilities estimated from tagged WSJ corpus: • 0.0032% of proper nouns are Janet: P(Janet|NNP) = 0.000032 • About half of determiners (DT) are the. • the can also be a proper noun.
  • 37. Hidden Markov Chain • A set of states (tags) • An output alphabet (words) • Initial state (beginning of sentence) • State transition probabilities ( P(ti|ti-1) ) • Symbol emission probabilities ( P(wi|ti) )
  • 38. Hidden Markov Chain • Model the tagging process: • Sentence: W = (w1, w2, … wn) • Tags T = (t1, t2, …, tn) • Joint probability: P(W, T) = ς𝑖=1 𝑛 𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛) • Example: • This/DET is/VB a/DET simple/JJ sentence/NN • Add begin(<s>) and end-of-sentence (</s>): P(W, T) = ς𝑖=1 𝑛 𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛) = P(DET|<s>) P(VB/DET) P(DET/VB) P(JJ/DET) P(NN/JJ) P(</s>|NN) x P(This|DET) P(is|VB) P(a|DET) P(simple|JJ) P(sentence|NN)
  • 39. Computation estimation of POS • Suppose we have C possible tags for each of the n words in the sentence • There are C^n possible tag sequences: the number grows exponentially in the length n • Viterbi algorithm: use dynamic programming to solve it
  • 40. Viterbi algorithm: • Target: argmaxT P(T|W) • Intuition: best path of length (i) at state of t must include best path of length (i-1) to the previous state • Use a table to store the partial result: • TXN table, v(t, i) is the prob of best state sequence for w1 … wi ending at state i • Fill in columns from left to right, the max is over each possible previous t’ V(t, i) = max { v (t’, i – 1) P(t|t’) P(wi|ti) }
  • 42. Viterbi algorithm: case study • W = the doctor is in.
  • 43. Viterbi algorithm: case study • W = the doctor is in.
  • 44. Viterbi algorithm: case study • W = the doctor is in.
  • 45. Viterbi algorithm: case study • W = the doctor is in.
  • 46. Viterbi algorithm: case study • W = the doctor is in.
  • 47. Viterbi algorithm: case study • W = the doctor is in.
  • 48. Viterbi algorithm: case study • W = the doctor is in.
  • 50. Dynamic programming: take-home message • Why fast: use memory to store partial result • DP algorithm component: state definition, initial condition, and induction rule • Solve DP problem with a table
  • 51. Top ten DP problems • Longest common subsequence • Shortest common subsequence • Longest increasing subsequence • Edit distance • Matrix chain multiplication • 0-1 knapsack problem • Partition problem • Rod cutting • Coin change problem • Word break problem
  • 52. Reference • http://people.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_s lides.pdf • https://en.wikipedia.org/wiki/Dynamic_programming • https://medium.com/@codingfreak/top-10-dynamic-programming- problems-5da486eeb360 • https://leetcode.com/problems/wildcard-matching/description/ • https://en.wikipedia.org/wiki/Longest_common_subsequence_probl em