Basics of Dynamic programming

Dynamic Programming:
basics and case studies
Houston Machine Learning Meetup
11/16/2019

Dynamic Programming: name and story
• Richard Bellman coined the term “Dynamic Programming”
Bellman autobiography
“The face of Wilson (the secretory of defense) would turn red, and he would get
violent if people used the term RESEARCH in his presence. You can imagine how he
felt, then, about the term MATHEMATICAL …. I had to do something to shield Wilson
and the Air Force from the fact that I was really doing MATHEMATICS inside the
RAND Corporation…. I decided therefore to use the word “PROGRAMMING". I
wanted to get across the idea that this was DYNAMIC, this was multistage, this was
time-varying…. I thought dynamic programming was a good name. It was something
not even a Congressman could object to..."

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by recursion
public int fib(int N) {
if (n == 0 || n == 1) { return n; }
return fib(N – 1) + fib(N – 2);
}
Time complexity: O(N) = 2^N
Recursion tree of Fibonacci sequence

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Time complexity: O(N) = N
Index 0 1 2 3 4 5 …..
F(N) 0 1

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Index 0 1 2 3 4 5 …..
F(N) 0 1 1

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2 3

Fibonacci sequence
• Definition:
• F(0) = 0
• F(1) = 1
• F(n) = F(n – 1) + F(n – 2)
• Solved by DP
Index 0 1 2 3 4 5 …..
F(N) 0 1 1 2 3 5

Fibonacci sequence
• Recursion:
• F(n) = F(n – 1) + F(n – 2)
• Starts from n
• When computing F(n), F(n-1) and F(n-2) is not known yet
• DP:
• F(n) = F(n – 1) + F(n – 2)
• Starts from 0 and 1
• When computing F(n), F(n-1) and F(n-2) has been stored in array
• Dynamic programming: partial result stored to save time

Longest common subsequence
• To find the longest subsequence common to two or more sequences
• String1: “AGCAT”
• String2: “GAC”
• Common subsequence: “A”, “C”, “G”, “AC”, “GA”,
• LCS: “AC”, or “GA”
• To use a table to find LCS:
• First column: string1(“AGCAT”)
• First row: string2(“GAC”)
• Table[i, j]: LCS of string1.substring(0, i) and string2.substring(0, j)

Wildcard matching
• Linux command-line:
user@bash: ls b*
barry.txt, blan.txt bob.txt
• Complicated example:
string = "adcab“
pattern = “*a*b“
• DP solution:
• Definition: table[i][j]
• Base case:
table[0][0] = true
first row: table[0][i + 1] = table[0][i] (pattern[i]=*)
• Induction rule:
(1) if string[i] equals pattern[j] or pattern[j] equals ?
table[i + ][j + 1] = table[i][j]
(2) if (pattern[j] equals *
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]
- * a * b
- T T F F F
a
d
c
a
b

Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T
b
user@bash: ls b*
string = "adcab“
• DP solution:
• Base case:
table[0][0] = true
• Induction rule:
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1]j + 1]

Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T
user@bash: ls b*
string = "adcab“
• DP solution:
• Base case:
table[0][0] = true
• Induction rule:
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]

Wildcard matching
- * a * b
- T T F F F
a F T T T F
d F T F T F
c F T F T F
a F T T T F
b F T F T T
user@bash: ls b*
string = "adcab“
• DP solution:
• Base case:
table[0][0] = true
• Induction rule:
table[i + 1][j + 1] = table [i + 1][j] or table [i][j + 1] j + 1]

Longest common subsequence and wildcard
matching
• DP starts from initial condition to the end of string:
• From left to right at each row
• From top to bottom at each cloumn
• State transition from table[i - 1][j - 1], table[i][j - 1], table[i - 1][j] to
table[i][j]
• Each time: move forward by one step
• State at each is the global optimum of that step
• Table (or diagram) is the best tool to simulate the processing

Matrix chain multiplication
• Multiple two matrices: A(10 x 100) and B(100 x 5)
• OUT[p][r] += A[p][q] * B[q][r]
• Computation = 10 x 100 x 5
• Multiple three matrices: A1(10 x 100), A2(100 X 5), and A3(5 x 50)
• ((A1 A2) A3) : 10 x 100 x 5 (A1 A2) + 10 x 5 x 50 = 7500
• (A1 (A2 A3)) : 100 x 5 x 50 (A2 A3) + 10 x 100 x 50 = 75000
• ((A1 A2) A3) is 10 times faster than (A1 (A2 A3)) in regarding to scalar
computation

Matrix chain multiplication
• How to optimize the chain multiplication of matrices ( A1, A2, A3, ….
An)
• DP induction rule:

Matrix chain multiplication: DP solution
• Six matrices multiplication:
• Status:
• M[i, j]: the min number of computations for the matrices (i to j) multiplication
• S[i, j]: the last-layer break-point for M[i, j]

(A1 (A2 A3)) ((A4 A5) A6)

• State hard to define:
• M[i, j]
• S[i, j]
• State transition complicated:
• By row and column not work
• From previous state to current state by the matrices length (Induction rule)

Framework of dynamic programming
• Three key components of dynamic programming algorithm:
• Definition of state
• Initial condition (base)
• Induction rule (state transition)
• Induction rule: difficult to find
• 1D/2D table for the thinking process

What is part of speech tagging?
• Identify parts of the speech (syntactic categories):
This is a simple sentence
DET VB DET ADJ NOUN
• POS tagging is a first step towards syntactic analysis (sematic analysis)
• Faster than full parsing
• Text classification and word disambiguation
• How to decide the correct label:
• Word to be labeled: chair is probably a noun
• Labels of surrounding word: if preceding word is a modal verb (.e.g., will) then this
word is more likely to be a verb
• Hidden Markov models can be used to work on this problem

Why is POS tagging hard?
• Ambiguity
glass of water/NOUN vs. water/VERB the plants
lie/VERB down vs. tell a lie/NOUN
wind/VERB down vs. a mighty wind/NOUN(homographs)
How about time flies like an arrow?
• Sparse data:
• Words we haven’t seen before
• Word-Tag pairs we haven’t seen before

Example transition probabilities
• Probabilities estimated from tagged WSJ corpus:
• Proper nouns (NNP) often begin sentences:P(NNP|<s>) = 0.28
• Modal verbs (MD) nearly always followed by bare verbs (VB).
• Adjectives (JJ) are often followed by nouns (NN).

Example output probabilities
• Probabilities estimated from tagged WSJ corpus:
• 0.0032% of proper nouns are Janet: P(Janet|NNP) = 0.000032
• About half of determiners (DT) are the.
• the can also be a proper noun.

Hidden Markov Chain
• A set of states (tags)
• An output alphabet (words)
• Initial state (beginning of sentence)
• State transition probabilities ( P(ti|ti-1) )
• Symbol emission probabilities ( P(wi|ti) )

Hidden Markov Chain
• Model the tagging process:
• Sentence: W = (w1, w2, … wn)
• Tags T = (t1, t2, …, tn)
• Joint probability: P(W, T) = ς𝑖=1
𝑛
𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛)
• Example:
• This/DET is/VB a/DET simple/JJ sentence/NN
• Add begin(<s>) and end-of-sentence (</s>):
P(W, T) = ς𝑖=1
𝑛
𝑃 𝑡𝑖 𝑡𝑖−1 𝑃 𝑤𝑖 𝑡𝑖 𝑃(</𝑠 > |𝑡 𝑛)
= P(DET|<s>) P(VB/DET) P(DET/VB) P(JJ/DET) P(NN/JJ)
P(</s>|NN) x P(This|DET) P(is|VB) P(a|DET) P(simple|JJ)
P(sentence|NN)

Computation estimation of POS
• Suppose we have C possible tags for each of the n words in the
sentence
• There are C^n possible tag sequences: the number grows
exponentially in the length n
• Viterbi algorithm: use dynamic programming to solve it

Viterbi algorithm:
• Target: argmaxT P(T|W)
• Intuition: best path of length (i) at state of t must include best path of
length (i-1) to the previous state
• Use a table to store the partial result:
• TXN table, v(t, i) is the prob of best state sequence for w1 … wi ending at
state i
• Fill in columns from left to right, the max is over each possible previous t’
V(t, i) = max { v (t’, i – 1) P(t|t’) P(wi|ti) }

Viterbi algorithm: case study
• W = the doctor is in.

Dynamic programming: take-home message
• Why fast: use memory to store partial result
• DP algorithm component: state definition, initial condition, and
induction rule
• Solve DP problem with a table

Top ten DP problems
• Longest common subsequence
• Shortest common subsequence
• Longest increasing subsequence
• Edit distance
• Matrix chain multiplication
• 0-1 knapsack problem
• Partition problem
• Rod cutting
• Coin change problem
• Word break problem

Reference
• http://people.cs.georgetown.edu/nschneid/cosc572/f16/12_viterbi_s
lides.pdf
• https://en.wikipedia.org/wiki/Dynamic_programming
• https://medium.com/@codingfreak/top-10-dynamic-programming-
problems-5da486eeb360
• https://leetcode.com/problems/wildcard-matching/description/
• https://en.wikipedia.org/wiki/Longest_common_subsequence_probl
em

Basics of Dynamic programming

More Related Content

What's hot (20)

Similar to Basics of Dynamic programming (20)

More from Yan Xu (20)

Recently uploaded (20)

Basics of Dynamic programming