Hidden Markov Model
17.10.14
1
목차
• MC이란 무엇인가
• HMM 모델이란
• 3가지 문제(Forward, Viteribi, Baum-Welch Algorithm)
• HMM 학습
2
Markov Chain
• Markov Chain은 시간 연속적인 데이터 설명할 때 쓰인다.
• Q = q1, q2, … ,qN
• A = a01, a02, … , an1, …, ann ( 𝑗=1
𝑛
𝑎𝑖𝑗 = 1 ) all I (천이 확률)
(Transition Probability)
• q0, qF
3
Markov Chain
• ai j = p(qj | qi) =
𝑝(𝑞𝑗 ∩𝑞𝑖)
𝑝(𝑞𝑖)
( 𝑗=1
𝑛
𝑎𝑖𝑗 = 1 ) 
• P(q1 | qi) + p(q2 | qi) + ….+p(qj | qi)+…+ p(qn | qi) = 1
•
𝑝(𝑞1 ∩𝑞𝑖)
𝑝(𝑞𝑖)
+
𝑝(𝑞2 ∩𝑞𝑖)
𝑝(𝑞𝑖)
+ … +
𝑝(𝑞𝑗 ∩𝑞𝑖)
𝑝(𝑞𝑖)
+… +
𝑝(𝑞𝑛 ∩𝑞𝑖)
𝑝(𝑞𝑖)
= 1
• Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1)
4
First Order Markov Chain
• qi(State)에서의 확률은 직전 상태에 의존한다.
(ex. 오늘의 날씨는 어제의 날씨 변화에만 영향을 받는다는 가정)
• Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1, qi-2) (2차)
5
HMM
• Markov Chain은 시간 연속적인 데이터 설명할 때 쓰인다.
Ex) Hot -> Cold -> Cold
• HMM은 ??
• “The same architecture comes up in speech recognition. In that case we see acoustic
events in the world and have to infer the presence of “hidden” words that are the
underlying causal source of the acoustics.”
ex) “Here I _ _”
 ‘go’ or ‘am’이 나타날 확률이 높다.
 이 두 단어를 대입해서 의미를 유추
6
• Q = q1, q2, … ,qN
• A = a01, a02, … , an1, …, ann ( 𝑗=1
𝑛
𝑎𝑖𝑗 = 1 ) all I (천이 확률)
• q0, qF
• O = o1, o2, o3, … ot (Sequence of Observation)
( ex) 관찰된 아이스크림 먹은 순서 3 -> 1-> 3)
• B = bi(Ot) (Observation Probabilities) (Emission Probabilities)
( State qi 에서 Ot 가 일어날 확률)
( ex) 둘째 날에 아이스크림 1개를 먹을 확률)
HMM
7
First Order Hidden Markov Model
• Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1)
• Output Independence:
P(oi | q1,…, qi,…,qT, o1,…,oi,…,oT ) = P(oi | qi)
Ot 관찰은 해당 스테이트 qi에서만 가능하다.
8
Summary(1)
• Q = q1, q2, … ,qN
• A = a01, a02, … , an1, …, ann ( 𝑗=1
𝑛
𝑎𝑖𝑗 = 1 ) all I (천이 확률)
• q0, qF
• O = o1, o2, o3, … ot (Sequence of Observation)
ex) 관찰된 아이스크림 먹은 순서 3 -> 1-> 3
ex) 관찰된 공 색깔의 순서 빨강 -> 파랑 -> 초록
• B = bi(Ot) (Observation Probabilities) (Emission Probabilities)
( State qi 에서 Ot 가 일어날 확률)
ex) 둘째 날에 아이스크림 1개를 먹을 확률
ex) 두번째 컵에서 초록색 공이 나올 확률
9
HMM으로 모델링이 가능한 경우, 세가지 문제
• Likelihood: Given an HMM λ = (A, B) and an observation sequence O, determine the likelihood P(O l λ)
ex) 희원이가 아이스크림을 3일에 걸쳐 3->1->3를 먹었을 때, HMM에서 3->1->3 (섭취가)
일어날 확률 : P( O | λ )
• Decoding: Given an observation sequence O and an HMM λ = (A, B), discover the best hidden state
sequence Q.
ex) 희원이가 아이스크림을 3일에 걸쳐 3->1->3를 먹었을 때, 가장 가능성 높은 날씨(H or C) 순서를 계산.
: Q = argmax𝑄=(𝑞1,…,𝑞𝑇)
P( Q, O | λ )
• Training: Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A
and B
ex) 날씨(H or C) 간의 천이 확률(Transition Probability)(A)와 해당 날씨에서 Ot(3->1->3를 먹은) 관찰 확률
(Emission Probability)(B)를 학습
10
HMM으로 모델링이 가능한 경우, 세가지 문제(2)
11
1. Likelihood Computation: Forward algorithm
• 관찰 순서의 가능성을 계산
- MC 라면 천이 확률 들의 곱 ex) Hot -> Hot -> Cold
- HMM 라면 … ex) 아이스크림 3개 -> 1개 -> 3개 가 일어날 확률 :p(3, 1, 3)
(날씨 정보는 가려져 있다.)
계산을 어떻게?? - HMM 특징을 생각하기
• Each hidden state produces only a single observation.
P 𝑂 𝑄 = 𝑖=1
𝑇
𝑝 𝑜𝑖 𝑞𝑖
• To compute the joint probability of being particular sequence Q
p(O,Q) = 𝑖=1
𝑇
𝑝 𝑜𝑖 𝑞𝑖 * 𝑖=1
𝑇
𝑝 𝑞𝑖 − 1 𝑞𝑖 = 𝑝 𝑂 𝑄 𝑝(𝑄)
• Total probability of observation by summing over all possible state sequences
p(O) = 𝑄 𝑝 𝑂, 𝑄 = 𝑄 𝑝 𝑂 𝑄 𝑝(𝑄)
12
예시
• 𝑝 3 1 3 | ℎ ℎ 𝑐 = p(3 | h) * p(1 | h) * p(3 | c)
= 0.4 * 0.2 * 0.1
• p(3 1 3, h h c) = p(h | start) * p(h | h) * p(c | h) *p(h | end)*𝑝 3 1 3 | ℎ ℎ 𝑐
• P(3 1 3) = p(3 1 3, h h h)
+ p(3 1 3, h h c)
+ …
+ p(3 1 3, c c c)
= 약 0.0022 O(𝑁𝑇)
13
Forward Algorithm
• Keep the sum of probabilities of all the paths coming to each
state I at time t
• Initialization : 𝑐(j) = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N
• Recursion : 𝛼𝑡(j) = 𝑖=1
𝑁
𝛼𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡)
• Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑖=1
𝑁
𝑎𝑇(𝑖) 𝑎𝑖𝐹
14
※ 𝑎2 2 : 시간 2th 에서 State가 두번째(H)에
있을 확률
= 약 0.0022 O(𝑁2
𝑇) 복잡도
15
2. Decoding : Viterbi algorithm
• calculate the most likely sequence of hidden states qi
• 𝑣𝑡(j) = 𝑚𝑎𝑥𝑞0,𝑞1,..,𝑞𝑡−1
P(q0… qt-1 , o1 o2 ... ot, qt = j | λ)
•
16
Viterbi algorithm
• Viterbi algorithm must produce a probability and also the
most likely state sequence. ( 확률 과 스테이트 순서 )
• Compute this best sequence by keeping track of the path of
hidden states that lead to each state.
• Note that the Viterbi algorithm is identical to the forward
algorithm except that it takes the max over the previous path
probabilities whereas the forward algorithm takes the sum.
17
Viterbi algorithm
• Initialization: 𝑉1 𝑗 = = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N
• Recursion: 𝑉𝑡 (j) = 𝑚𝑎𝑥𝑖=1
𝑁
𝑉𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗 𝑂𝑡
b𝑡𝑡(𝑗) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖=1
𝑁
𝑉𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗 𝑂𝑡 (backpointers)
• Termination:
The best score : P *= 𝑉𝑡(𝑞𝑓) = 𝑚𝑎𝑥𝑖=1
𝑁
𝑉𝑡 𝑖 ∗ 𝑎𝑖𝐹
The start of backtrace : 𝑞𝑡 *= 𝑏𝑡𝑇(𝑞𝑓) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖=1
𝑁
𝑉𝑡 𝑖 ∗ 𝑎𝑖𝐹
(계산이 끝난 후 backtrack하면서 State 선택함)
18
0.009 = max( V2(2) * 0.6*0.4 , V2(1) * 0.4 * 0.4)
0.0024
0.0009
19
Summary(2)
Forward algorithm
• p(O) ex) P(3 1 3 | λ) = 약 0.0022
Viterbi algorithm
• P( Q, O | λ ) ex) P( Q, (3 1 3) | λ) = 약 0.0009
• Q = argmax𝑄=(𝑞1,…,𝑞𝑇)
P( Q, O | λ ) ex) H<- H <-H
20
3. Learning: HMM Training
• Forward-backward or Baum-Welch Algorithm
• Special case of EM Algorithm(Iterative algorithm)
• M=(A, B, 𝜋) or λ = (A, B) determine HMM parameters, that
best fit training data, that is maximizes P(O | λ).
• 관측된 데이터 만을 가지고, A(aij)와 B(bi(Ot))를 구한다. (1)
21
Backward Algorithm
• Initialization : 𝛽(i) = 𝑎𝑖𝐹 1≤i≤N
• Recursion : 𝛽𝑡(i) = 𝑗=1
𝑁
𝛽𝑡+1(j)∗𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡+1)
• Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑏1(q0) = 𝑗=1
𝑁
𝑎0𝑗*𝑏𝑗(𝑂1)*𝑏1(j)
22
Backward Algorithm
• q1 ~ qn스테이트 에서 Ot+1관
찰을 할때 각각의 확률은
b1(Ot+1) ~ bn(Ot+1)
• Βt(i)라는 새로운 변수가 등장.
• Backward probabilities는 A와 B
를 구하는데 도움이 된다.
23
Forward Algorithm
• Initialization : 𝛼1(j) = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N
• Recursion : 𝛼𝑡(j) = 𝑖=1
𝑁
𝛼𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡)
• Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑖=1
𝑁
𝑎𝑇(𝑖) 𝑎𝑖𝐹
24
미리보기
• P(𝑞𝑡 = i, O| λ ) = 𝛼𝑡(i) ∗ 𝛽𝑡(i)
• 𝛼𝑡(i) 와 𝛽𝑡(i) 는 앞으로 나올 𝛿𝑡 𝑖, 𝑗 와 𝛾𝑡(j) 를 계산할 때 쓰인다.
25
천이 확률(A)의 재계산
• 새로운 변수 정의
• 𝛿𝑡 𝑖, 𝑗 : O, λ 가 주어졌을 때 시간 t에서 I, t+1에서 j에 있을 확률 : P(qt = i, qt+1 = j | O, λ)
26
참고
: 시간 t에서 i에 있을 확률, 시간
t+1에서 j에 있을 확률
그 값을 구하기 위해선
식을 먼저 구할 수 있어야 한다.
= / P(O|λ)
27
천이 확률(A)의 재계산
• 
• ( i에서 j로 천이할 기대 값 /
i에서 다른 곳으로 천이할 기대 값)으로 표현 가능
28
관찰 확률(B)의 재계산
• 새로운 변수 정의
• 𝛾𝑡(j): O, λ 가 주어졌을 때 시간 t에서 State j에 있을 확률 : P(qt = j | O, λ)
29
관찰 확률(B)의 재계산
•
• ( j에 있으면서 (Vk의 확률을 가지는) 관찰 Ot를 할 기대 값 /
j에 있을 기대 값) 으로 표현 가능
• we sum 𝛾𝑡(j) for all time steps t in which the observation Ot is the
symbol Vk. we sum 𝛾𝑡(j) over all time steps t.
• The result is the percentage of the times that we were in state j
and saw symbol vk
30
Forward-Backward Algorithm
31
Summary(3)
• In the E-step, we compute the expected state occupancy count 𝛾
and the expected state transition count 𝛿 from the earlier A and B
probabilities.
• In the M-step, we use 𝛾 and 𝛿 to recompute new A and B
probabilities.
• 목표: P(O | λ’) > P(O | λ)
• 개념적으로 Forward-Backward Algorithm은 Unsupervised Learning
이 가능하지만, 실제로는 초기 값이 매우 중요하다. λ = (A, B)를
먼저 정의하고, A와 B를 트레이닝 시키는 것이 일반적.
32
참고
 P(X|Y, Z) = P(X,Y,Z) / P(Y,Z)
P(X,Y|Z) = P(X,Y,Z) / P(Z)
P(Y|Z) = P(Y,Z) / P(Z)
https://math.stackexchange.com/questions/301207/why-is-px-yz-pyx-zpxz?answertab=active#tab-top
https://www.cl.cam.ac.uk/teaching/1011/ArtIntII/basic-probability.pdf
• Argmax(f(x)): f(x)를 최대로 만드는 x값.
Ex) Argmax(Cos(x)) = {0, 2𝜋} (0≤ 𝑥 ≤ 2𝜋)
• Argmin(f(x)): f(x)를 최소로 만드는 x값.
Ex) Argmin(Cos(x)) = {𝜋} (0≤ 𝑥 ≤ 2𝜋)
33
Reference
• Speech and Language Processing. Daniel Jurafsky & James H.
Martin. Chap9
• www.cedar.buffalo.edu/~govind/CS661/Lec12.ppt
• https://github.com/dsindex/blog/wiki/%5BHMM%5D--
Hidden-Markov-Model-%EA%B5%AC%ED%98%84-
%EA%B4%80%EC%A0%90%EC%97%90%EC%84%9C
• http://blog.daum.net/hazzling/15669927
34
• [Rabiner, 1986] L. Rabiner and B. Juang, “An Introduction to
Hidden Markov Models,” Proc. IEEE ASSP Magazine, 1986.
Reference
35

More Related Content

PPTX
Hidden Markov Model - The Most Probable Path
PPT
HIDDEN MARKOV MODEL AND ITS APPLICATION
PPT
Hidden Markov Models with applications to speech recognition
PPTX
Viterbi algorithm
PPTX
Hidden Markov Model
PPTX
Hidden markov model
PPTX
Hidden markov model
PPTX
Hidden Markov Model - The Most Probable Path
HIDDEN MARKOV MODEL AND ITS APPLICATION
Hidden Markov Models with applications to speech recognition
Viterbi algorithm
Hidden Markov Model
Hidden markov model
Hidden markov model

What's hot (20)

PPT
Hidden markov model ppt
PPTX
Hidden markov model
PPT
HMM (Hidden Markov Model)
PPTX
Hidden Markov Models
PDF
Probabilistic Models of Time Series and Sequences
PPTX
Hidden Markov Model paper presentation
PDF
Hastings 1970
PPTX
Stock Market Prediction using Hidden Markov Models and Investor sentiment
PPTX
Theorem-proving Verification of Multi-clock Synchronous Circuits on Multimoda...
PDF
Markov Chain Monte Carlo Methods
PDF
Metropolis-Hastings MCMC Short Tutorial
PDF
Markov Chain Monte Carlo explained
PDF
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
PPTX
Fourier transform
PDF
Sampling and Markov Chain Monte Carlo Techniques
PDF
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
PDF
Markov Models
PDF
By BIRASA FABRICE
PDF
Quantum automata for infinite periodic words
Hidden markov model ppt
Hidden markov model
HMM (Hidden Markov Model)
Hidden Markov Models
Probabilistic Models of Time Series and Sequences
Hidden Markov Model paper presentation
Hastings 1970
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Theorem-proving Verification of Multi-clock Synchronous Circuits on Multimoda...
Markov Chain Monte Carlo Methods
Metropolis-Hastings MCMC Short Tutorial
Markov Chain Monte Carlo explained
Monte Caro Simualtions, Sampling and Markov Chain Monte Carlo
Fourier transform
Sampling and Markov Chain Monte Carlo Techniques
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Markov Models
By BIRASA FABRICE
Quantum automata for infinite periodic words
Ad

Similar to Hidden markov model explained (20)

PPT
Machine learning........................
PDF
Outlier Analysis.pdf
PPTX
Machine learning
PPTX
PPT - Deep and Confident Prediction For Time Series at Uber
PPTX
Deep Learning for AI (2)
PDF
AI 바이오 (2_3일차).pdf
PDF
RL_UpsideDown
PDF
Introduction of Deep Reinforcement Learning
PPTX
Rethinking attention with performers
PPTX
Reinforcement Learning
PPTX
Elementary statistical inference1
PPTX
Seminar9
PDF
Decision tree and ensemble
PDF
Support vector machines
PDF
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PDF
1시간만에 머신러닝 개념 따라 잡기
PDF
Hidden markovmodel
PDF
Reinforcement Learning on Mine Sweeper
PDF
Preference amplification in recommendation system
PDF
2012 mdsp pr06  hmm
Machine learning........................
Outlier Analysis.pdf
Machine learning
PPT - Deep and Confident Prediction For Time Series at Uber
Deep Learning for AI (2)
AI 바이오 (2_3일차).pdf
RL_UpsideDown
Introduction of Deep Reinforcement Learning
Rethinking attention with performers
Reinforcement Learning
Elementary statistical inference1
Seminar9
Decision tree and ensemble
Support vector machines
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
1시간만에 머신러닝 개념 따라 잡기
Hidden markovmodel
Reinforcement Learning on Mine Sweeper
Preference amplification in recommendation system
2012 mdsp pr06  hmm
Ad

Recently uploaded (20)

PDF
Designing Intelligence for the Shop Floor.pdf
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
Computer Software - Technology and Livelihood Education
PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
Microsoft Office 365 Crack Download Free
PDF
Cost to Outsource Software Development in 2025
PPTX
Computer Software and OS of computer science of grade 11.pptx
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
CCleaner 6.39.11548 Crack 2025 License Key
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
assetexplorer- product-overview - presentation
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Designing Intelligence for the Shop Floor.pdf
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Computer Software - Technology and Livelihood Education
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Visual explanation of Dijkstra's Algorithm using Python
Microsoft Office 365 Crack Download Free
Cost to Outsource Software Development in 2025
Computer Software and OS of computer science of grade 11.pptx
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
CCleaner 6.39.11548 Crack 2025 License Key
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Trending Python Topics for Data Visualization in 2025
MCP Security Tutorial - Beginner to Advanced
assetexplorer- product-overview - presentation
Salesforce Agentforce AI Implementation.pdf
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025

Hidden markov model explained

  • 2. 목차 • MC이란 무엇인가 • HMM 모델이란 • 3가지 문제(Forward, Viteribi, Baum-Welch Algorithm) • HMM 학습 2
  • 3. Markov Chain • Markov Chain은 시간 연속적인 데이터 설명할 때 쓰인다. • Q = q1, q2, … ,qN • A = a01, a02, … , an1, …, ann ( 𝑗=1 𝑛 𝑎𝑖𝑗 = 1 ) all I (천이 확률) (Transition Probability) • q0, qF 3
  • 4. Markov Chain • ai j = p(qj | qi) = 𝑝(𝑞𝑗 ∩𝑞𝑖) 𝑝(𝑞𝑖) ( 𝑗=1 𝑛 𝑎𝑖𝑗 = 1 )  • P(q1 | qi) + p(q2 | qi) + ….+p(qj | qi)+…+ p(qn | qi) = 1 • 𝑝(𝑞1 ∩𝑞𝑖) 𝑝(𝑞𝑖) + 𝑝(𝑞2 ∩𝑞𝑖) 𝑝(𝑞𝑖) + … + 𝑝(𝑞𝑗 ∩𝑞𝑖) 𝑝(𝑞𝑖) +… + 𝑝(𝑞𝑛 ∩𝑞𝑖) 𝑝(𝑞𝑖) = 1 • Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1) 4
  • 5. First Order Markov Chain • qi(State)에서의 확률은 직전 상태에 의존한다. (ex. 오늘의 날씨는 어제의 날씨 변화에만 영향을 받는다는 가정) • Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1, qi-2) (2차) 5
  • 6. HMM • Markov Chain은 시간 연속적인 데이터 설명할 때 쓰인다. Ex) Hot -> Cold -> Cold • HMM은 ?? • “The same architecture comes up in speech recognition. In that case we see acoustic events in the world and have to infer the presence of “hidden” words that are the underlying causal source of the acoustics.” ex) “Here I _ _”  ‘go’ or ‘am’이 나타날 확률이 높다.  이 두 단어를 대입해서 의미를 유추 6
  • 7. • Q = q1, q2, … ,qN • A = a01, a02, … , an1, …, ann ( 𝑗=1 𝑛 𝑎𝑖𝑗 = 1 ) all I (천이 확률) • q0, qF • O = o1, o2, o3, … ot (Sequence of Observation) ( ex) 관찰된 아이스크림 먹은 순서 3 -> 1-> 3) • B = bi(Ot) (Observation Probabilities) (Emission Probabilities) ( State qi 에서 Ot 가 일어날 확률) ( ex) 둘째 날에 아이스크림 1개를 먹을 확률) HMM 7
  • 8. First Order Hidden Markov Model • Markov Assumption: P(qi | q1,…, qi-1) = P(qi | qi-1) • Output Independence: P(oi | q1,…, qi,…,qT, o1,…,oi,…,oT ) = P(oi | qi) Ot 관찰은 해당 스테이트 qi에서만 가능하다. 8
  • 9. Summary(1) • Q = q1, q2, … ,qN • A = a01, a02, … , an1, …, ann ( 𝑗=1 𝑛 𝑎𝑖𝑗 = 1 ) all I (천이 확률) • q0, qF • O = o1, o2, o3, … ot (Sequence of Observation) ex) 관찰된 아이스크림 먹은 순서 3 -> 1-> 3 ex) 관찰된 공 색깔의 순서 빨강 -> 파랑 -> 초록 • B = bi(Ot) (Observation Probabilities) (Emission Probabilities) ( State qi 에서 Ot 가 일어날 확률) ex) 둘째 날에 아이스크림 1개를 먹을 확률 ex) 두번째 컵에서 초록색 공이 나올 확률 9
  • 10. HMM으로 모델링이 가능한 경우, 세가지 문제 • Likelihood: Given an HMM λ = (A, B) and an observation sequence O, determine the likelihood P(O l λ) ex) 희원이가 아이스크림을 3일에 걸쳐 3->1->3를 먹었을 때, HMM에서 3->1->3 (섭취가) 일어날 확률 : P( O | λ ) • Decoding: Given an observation sequence O and an HMM λ = (A, B), discover the best hidden state sequence Q. ex) 희원이가 아이스크림을 3일에 걸쳐 3->1->3를 먹었을 때, 가장 가능성 높은 날씨(H or C) 순서를 계산. : Q = argmax𝑄=(𝑞1,…,𝑞𝑇) P( Q, O | λ ) • Training: Given an observation sequence O and the set of states in the HMM, learn the HMM parameters A and B ex) 날씨(H or C) 간의 천이 확률(Transition Probability)(A)와 해당 날씨에서 Ot(3->1->3를 먹은) 관찰 확률 (Emission Probability)(B)를 학습 10
  • 11. HMM으로 모델링이 가능한 경우, 세가지 문제(2) 11
  • 12. 1. Likelihood Computation: Forward algorithm • 관찰 순서의 가능성을 계산 - MC 라면 천이 확률 들의 곱 ex) Hot -> Hot -> Cold - HMM 라면 … ex) 아이스크림 3개 -> 1개 -> 3개 가 일어날 확률 :p(3, 1, 3) (날씨 정보는 가려져 있다.) 계산을 어떻게?? - HMM 특징을 생각하기 • Each hidden state produces only a single observation. P 𝑂 𝑄 = 𝑖=1 𝑇 𝑝 𝑜𝑖 𝑞𝑖 • To compute the joint probability of being particular sequence Q p(O,Q) = 𝑖=1 𝑇 𝑝 𝑜𝑖 𝑞𝑖 * 𝑖=1 𝑇 𝑝 𝑞𝑖 − 1 𝑞𝑖 = 𝑝 𝑂 𝑄 𝑝(𝑄) • Total probability of observation by summing over all possible state sequences p(O) = 𝑄 𝑝 𝑂, 𝑄 = 𝑄 𝑝 𝑂 𝑄 𝑝(𝑄) 12
  • 13. 예시 • 𝑝 3 1 3 | ℎ ℎ 𝑐 = p(3 | h) * p(1 | h) * p(3 | c) = 0.4 * 0.2 * 0.1 • p(3 1 3, h h c) = p(h | start) * p(h | h) * p(c | h) *p(h | end)*𝑝 3 1 3 | ℎ ℎ 𝑐 • P(3 1 3) = p(3 1 3, h h h) + p(3 1 3, h h c) + … + p(3 1 3, c c c) = 약 0.0022 O(𝑁𝑇) 13
  • 14. Forward Algorithm • Keep the sum of probabilities of all the paths coming to each state I at time t • Initialization : 𝑐(j) = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N • Recursion : 𝛼𝑡(j) = 𝑖=1 𝑁 𝛼𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡) • Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑖=1 𝑁 𝑎𝑇(𝑖) 𝑎𝑖𝐹 14
  • 15. ※ 𝑎2 2 : 시간 2th 에서 State가 두번째(H)에 있을 확률 = 약 0.0022 O(𝑁2 𝑇) 복잡도 15
  • 16. 2. Decoding : Viterbi algorithm • calculate the most likely sequence of hidden states qi • 𝑣𝑡(j) = 𝑚𝑎𝑥𝑞0,𝑞1,..,𝑞𝑡−1 P(q0… qt-1 , o1 o2 ... ot, qt = j | λ) • 16
  • 17. Viterbi algorithm • Viterbi algorithm must produce a probability and also the most likely state sequence. ( 확률 과 스테이트 순서 ) • Compute this best sequence by keeping track of the path of hidden states that lead to each state. • Note that the Viterbi algorithm is identical to the forward algorithm except that it takes the max over the previous path probabilities whereas the forward algorithm takes the sum. 17
  • 18. Viterbi algorithm • Initialization: 𝑉1 𝑗 = = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N • Recursion: 𝑉𝑡 (j) = 𝑚𝑎𝑥𝑖=1 𝑁 𝑉𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗 𝑂𝑡 b𝑡𝑡(𝑗) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖=1 𝑁 𝑉𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗 𝑂𝑡 (backpointers) • Termination: The best score : P *= 𝑉𝑡(𝑞𝑓) = 𝑚𝑎𝑥𝑖=1 𝑁 𝑉𝑡 𝑖 ∗ 𝑎𝑖𝐹 The start of backtrace : 𝑞𝑡 *= 𝑏𝑡𝑇(𝑞𝑓) = 𝑎𝑟𝑔𝑚𝑎𝑥𝑖=1 𝑁 𝑉𝑡 𝑖 ∗ 𝑎𝑖𝐹 (계산이 끝난 후 backtrack하면서 State 선택함) 18
  • 19. 0.009 = max( V2(2) * 0.6*0.4 , V2(1) * 0.4 * 0.4) 0.0024 0.0009 19
  • 20. Summary(2) Forward algorithm • p(O) ex) P(3 1 3 | λ) = 약 0.0022 Viterbi algorithm • P( Q, O | λ ) ex) P( Q, (3 1 3) | λ) = 약 0.0009 • Q = argmax𝑄=(𝑞1,…,𝑞𝑇) P( Q, O | λ ) ex) H<- H <-H 20
  • 21. 3. Learning: HMM Training • Forward-backward or Baum-Welch Algorithm • Special case of EM Algorithm(Iterative algorithm) • M=(A, B, 𝜋) or λ = (A, B) determine HMM parameters, that best fit training data, that is maximizes P(O | λ). • 관측된 데이터 만을 가지고, A(aij)와 B(bi(Ot))를 구한다. (1) 21
  • 22. Backward Algorithm • Initialization : 𝛽(i) = 𝑎𝑖𝐹 1≤i≤N • Recursion : 𝛽𝑡(i) = 𝑗=1 𝑁 𝛽𝑡+1(j)∗𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡+1) • Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑏1(q0) = 𝑗=1 𝑁 𝑎0𝑗*𝑏𝑗(𝑂1)*𝑏1(j) 22
  • 23. Backward Algorithm • q1 ~ qn스테이트 에서 Ot+1관 찰을 할때 각각의 확률은 b1(Ot+1) ~ bn(Ot+1) • Βt(i)라는 새로운 변수가 등장. • Backward probabilities는 A와 B 를 구하는데 도움이 된다. 23
  • 24. Forward Algorithm • Initialization : 𝛼1(j) = 𝑎𝑜𝑗 * 𝑏𝑗(𝑜1) 1≤j≤N • Recursion : 𝛼𝑡(j) = 𝑖=1 𝑁 𝛼𝑡−1 𝑖 ∗ 𝑎𝑖𝑗*𝑏𝑗(𝑂𝑡) • Termination : P(O|λ) = 𝑎𝑇(𝑞𝐹) = 𝑖=1 𝑁 𝑎𝑇(𝑖) 𝑎𝑖𝐹 24
  • 25. 미리보기 • P(𝑞𝑡 = i, O| λ ) = 𝛼𝑡(i) ∗ 𝛽𝑡(i) • 𝛼𝑡(i) 와 𝛽𝑡(i) 는 앞으로 나올 𝛿𝑡 𝑖, 𝑗 와 𝛾𝑡(j) 를 계산할 때 쓰인다. 25
  • 26. 천이 확률(A)의 재계산 • 새로운 변수 정의 • 𝛿𝑡 𝑖, 𝑗 : O, λ 가 주어졌을 때 시간 t에서 I, t+1에서 j에 있을 확률 : P(qt = i, qt+1 = j | O, λ) 26
  • 27. 참고 : 시간 t에서 i에 있을 확률, 시간 t+1에서 j에 있을 확률 그 값을 구하기 위해선 식을 먼저 구할 수 있어야 한다. = / P(O|λ) 27
  • 28. 천이 확률(A)의 재계산 •  • ( i에서 j로 천이할 기대 값 / i에서 다른 곳으로 천이할 기대 값)으로 표현 가능 28
  • 29. 관찰 확률(B)의 재계산 • 새로운 변수 정의 • 𝛾𝑡(j): O, λ 가 주어졌을 때 시간 t에서 State j에 있을 확률 : P(qt = j | O, λ) 29
  • 30. 관찰 확률(B)의 재계산 • • ( j에 있으면서 (Vk의 확률을 가지는) 관찰 Ot를 할 기대 값 / j에 있을 기대 값) 으로 표현 가능 • we sum 𝛾𝑡(j) for all time steps t in which the observation Ot is the symbol Vk. we sum 𝛾𝑡(j) over all time steps t. • The result is the percentage of the times that we were in state j and saw symbol vk 30
  • 32. Summary(3) • In the E-step, we compute the expected state occupancy count 𝛾 and the expected state transition count 𝛿 from the earlier A and B probabilities. • In the M-step, we use 𝛾 and 𝛿 to recompute new A and B probabilities. • 목표: P(O | λ’) > P(O | λ) • 개념적으로 Forward-Backward Algorithm은 Unsupervised Learning 이 가능하지만, 실제로는 초기 값이 매우 중요하다. λ = (A, B)를 먼저 정의하고, A와 B를 트레이닝 시키는 것이 일반적. 32
  • 33. 참고  P(X|Y, Z) = P(X,Y,Z) / P(Y,Z) P(X,Y|Z) = P(X,Y,Z) / P(Z) P(Y|Z) = P(Y,Z) / P(Z) https://math.stackexchange.com/questions/301207/why-is-px-yz-pyx-zpxz?answertab=active#tab-top https://www.cl.cam.ac.uk/teaching/1011/ArtIntII/basic-probability.pdf • Argmax(f(x)): f(x)를 최대로 만드는 x값. Ex) Argmax(Cos(x)) = {0, 2𝜋} (0≤ 𝑥 ≤ 2𝜋) • Argmin(f(x)): f(x)를 최소로 만드는 x값. Ex) Argmin(Cos(x)) = {𝜋} (0≤ 𝑥 ≤ 2𝜋) 33
  • 34. Reference • Speech and Language Processing. Daniel Jurafsky & James H. Martin. Chap9 • www.cedar.buffalo.edu/~govind/CS661/Lec12.ppt • https://github.com/dsindex/blog/wiki/%5BHMM%5D-- Hidden-Markov-Model-%EA%B5%AC%ED%98%84- %EA%B4%80%EC%A0%90%EC%97%90%EC%84%9C • http://blog.daum.net/hazzling/15669927 34
  • 35. • [Rabiner, 1986] L. Rabiner and B. Juang, “An Introduction to Hidden Markov Models,” Proc. IEEE ASSP Magazine, 1986. Reference 35