SlideShare a Scribd company logo
An Introduction to HMM


        Browny
       2010.07.21
MM vs. HMM

                States


States




             Observations
Markov Model
• Given 3 weather states:
  – {S1, S2, S3} = {rain, cloudy, sunny}
                    Rain   Cloudy   Sunny
          Rain      0.4    0.3      0.3
          Cloudy    0.2    0.6      0.2
          Sunny     0.1    0.1      0.8


• What is the probabilities for next 7 days
  will be {sun, sun, rain, rain, sun, cloud,
  sun} ?
Hidden Markov Model
• The states
  – We don’t understand, Hidden!
  – But it can be indirectly observed


• Example
  – 北極or赤道(model), Hot/Cold(state), 1/2/3
    ice cream(observation)
Hidden Markov Model
• The observation is a probability function
  of state which is not observable directly


                           Hidden States
HMM Elements
• N, the number of states in the model
• M, the number of distinct observation
  symbols
• A, the state transition probability distribution
• B, the observation symbol probability
  distribution in states
• π, the initial state distribution     λ: model
Example
            P(…|C)           P(…|H)     P(…|Start)
 P(1|…)       0.7              0.1

 P(2|…)      0.2       B:      0.2
                   Observation
 P(3|…)      0.1               0.7

 P(C|…)      0.8               0.1           0.5

                        A:            π:
 P(H|…)      0.1               0.8            0.5
                    Transition        initial

P(STOP|…)    0.1               0.1            0
3 Problems
3 Problems
1. 觀察到的現象最符合哪一個模型
   P(觀察到的現象|模型)
2. 怎樣的狀態序列最符合觀察到的現
   象和已知的模型
   P(狀態序列|觀察到的現象, 模型)
3. 怎樣的模型最有可能產生觀察到的
   現象
   what 模型 maximize P(觀察到的現象|
   模型)
Solution 1
• 已知模型,一觀察序列之產生機率 P(O|λ)
          R1               R1         R1
     S1               S1         S1
          R2               R2         R2

     S2   R1          S2 R1      S2   R1
          R2             R2           R2

     S3   R1          S3 R1      S3   R1
          R2             R2           R2

          1                2          3    t

     觀察到 R1      R1        R2 的機率為多少?
Solution 1
• 考慮一特定的狀態序列
                Q = q1, q2 … qT

• 產生出一特定觀察序列之機率為

 P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ)

           = bq1(O1) * bq2(O2) * … * bqT(OT)
Solution 1
• 此一特定序列發生之機率為

               P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT

• 已知模型,一觀察序列之產生機率 P(O|λ)

 P(O|λ) = ,q∑ q P(O|Q, λ) * P(Q| λ)
         q  ,...,
               1   2   T




 = q ,q∑ q πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT)
       ,...,
   1   2   T
Solution 1
                 狀態的數量)
• Complexity (N: 狀態的數量
  – 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換
    組合數)
  – For N=5 states, T=100 observations, there are
    order 2*100*5100 ≈ 1072 computations!!
• Forward Algorithm
  – Forward variable αt(i) (給定時間 t 時狀態為 Si 的
    條件下,向前   向前局部觀察序列為O1, O2, O3…, Ot 的
             向前
    機率)
           at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )
Solution 1
                          R1                R1                       R1
                    S1                 S1                      S1
                          R2                R2                       R2
                                                                              When O1 = R1
                    S2    R1           S2 R1                   S2    R1
                          R2              R2                         R2

                    S3    R1           S3 R1                   S3    R1
                          R2              R2                         R2

                         1                  2                       3                   t

α1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
                                     α 2 (1) = α1 (1) a11 + α1 ( 2 ) a21 + α1 ( 3) a31  b1 ( O2 )
                                                                                       
 α1 (1) = π 1b1 (O1 )
 α1 (2) = π 2b2 (O1 )                α 2 ( 2 ) = α1 (1) a12 + α1 ( 2 ) a22 + α1 ( 3) a32  b2 ( O2 )
                                                                                         
 α1 (3) = π 3b3 (O1 )
Forward Algorithm
• Initialization:
                  α1 (i ) = π i bi (O1 ) 1 ≤ i ≤ N
• Induction:
                  N                             1 ≤ t ≤ T −1
    αt +1 ( j ) = ∑αt ( i ) aij  bj ( Ot +1 )   1 ≤ j ≤ N
                   i=1          

• Termination:
                                            N
                          P(O | λ ) = ∑ αT (i )
                                           i =1
Backward Algorithm
• Forward Algorithm
        at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )


• Backward Algorithm
  – 給定時間 t 時狀態為 Si 的條件下,向後    向後局
                              向後
    部觀察序列為 Ot+1, Ot+2, …, OT的機率

        βt (i ) = P(Ot +1 , Ot + 2 ,..., OT , qt = Si | λ )
Backward Algorithm
• Initialization
                      βT (i ) = 1 1 ≤ i ≤ N
• Induction
                N
                                                t = T −1, T − 2, ...,1
    βt (i ) = ∑ aij b j (Ot +1 ) β t +1 ( j )
               j =1                             1≤ i ≤ N
Backward Algorithm
             R1               R1             R1
    S1                   S1         S1
             R2               R2             R2
                                                  When OT = R1
    S2       R1          S2 R1      S2       R1
             R2             R2               R2

    S3       R1          S3 R1      S3       R1
             R2             R2               R2

         1                    2          3             t

                  N
β T −1 (1) = ∑ a1 j b j ( OT ) β T ( j )
                  j =1

= a11b1 ( OT ) + a12 b2 ( OT ) + a13b3 ( OT )
Solution 2
• 怎樣的狀態序列最能解釋觀察到的現
  象和已知的模型
  P(狀態序列|觀察到的現象, 模型)

• 無精確解,有很多種方式解此問題,
  對狀態序列的不同限制有不同的解法
  對狀態序列的不同限制
Solution 2
• 例: Choose the state qt which are individually
  most likely
  – γt(i) : the probability of being in state Si at
    time t, given the observation sequence O,
    and the model λ
                P (O | qt = Si , λ ) α t ( i ) βt ( i )    α t ( i ) βt ( i )
     γ t (i ) =                     =                   = N
                    P (O λ )           P (O λ )
                                                         ∑ α t ( i ) βt ( i )
                                                           i =1

     qt = argmax γ t ( i )  1 ≤ t ≤ T
                           
             1≤i ≤ N
Viterbi algorithm
• The most widely used criteria is to find
  the “single best state sequence”
     maxmize P ( Q | O, λ ) ≈ maxmize P ( Q, O | λ )


• A formal technique exists, based on
  dynamic programming methods, and is
  called the Viterbi algorithm
Viterbi algorithm
• To find the single best state sequence, Q =
  {q1, q2, …, qT}, for the given observation
  sequence O = {O1, O2, …, OT}

• δt(i): the best score (highest prob.) along a
  single path, at time t, which accounts for the
  first t observations and end in state Si
   δ t ( i ) = max P  q1 q2 ... qt = Si , O1 O2 ... Ot λ 
                                                         
            1 q , q ,..., q
                2   t −1
Viterbi algorithm
• Initialization - δ1(i)
  – When t = 1 the most probable path to a
    state does not sensibly exist

  – However we use the probability of being in
    that state given t = 1 and the observable
    state O1
                δ1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
               ψ (i ) = 0
Viterbi algorithm
• Calculate δt(i) when t > 1
  – δt(i) : The most probable path to the state X
    at time t
  – This path to X will have to pass through one
    of the states A, B or C at time (t-1)
  Most probable path to A: δ t −1 ( A)   a AX   bX ( Ot )
Viterbi algorithm
• Recursion
   δ t ( j ) = max δ t −1 ( i ) aij  b j ( Ot )
                                                  2≤t ≤T
               1≤ i ≤ N

   ψ t ( j ) = argmax δ t −1 ( i ) aij            1≤ j ≤ N
                      
                     1≤ i ≤ N
                                        

• Termination
   P* = max δ T ( i ) 
                      
           1≤i ≤ N

   q = argmax δ T ( i ) 
     *
     T                  
             1≤i ≤ N
Viterbi algorithm
• Path (state sequence) backtracking
   qt* = ψ t +1 (qt*+1 )   t = T − 1, T − 2, ..., 1
   qT −1 = ψ T (qT ) = argmax δ T −1 ( i ) aiq* 
    *            *

                        1≤i ≤ N               T 


   ...
   ...
    *         *
   q1 = ψ 2 (q2 )
Solution 3
• 怎樣的模型 λ = (A, B, π) 最有可能產生
  觀察到的現象
   what 模型 maximize P(觀察到的現象|
  模型)
• There is no known analytic solution. We
  can choose λ = (A, B, π) such that P(O| λ)
  is locally maximized using an iterative
  procedure
Baum-Welch Method
    • Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ)
         – The probability of being in state Si at time t,
           and state Sj at time t+1

              α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
ξt ( i, j ) =
                            P (O λ )
         α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
=    N    N

    ∑∑ α ( i ) a b ( O ) β ( j )
    i =1 j =1
                t        ij   j    t +1    t +1
Baum-Welch Method
• γt(i) : the probability of being in state Si at time
  t, given the observation sequence O, and the
  model λ
                                              α t ( i ) βt ( i )    α ( i ) βt ( i )
                                 γ t (i ) =                      = N t
                                                P (O λ )
                                                                  ∑ α t ( i ) βt ( i )
• Relate γt(i) to ξt(i, j)                                         i =1



                                               α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
                N                ξt ( i, j ) =
     γ t ( i ) = ∑ ξt ( i, j )                               P (O λ )
                j =1                     α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
                                 =   N    N

                                     ∑∑ α ( i ) a b ( O ) β ( j )
                                     i =1 j =1
                                                 t       ij   j    t +1    t +1
Baum-Welch Method
• The expected number of times that state Si is
  visited
         T −1

         ∑ γ ( i ) = Expected number of transitions from Si
         t =1
                t




• Similarly, the expected number of transitions
  from state Si to state Sj
  T −1

  ∑ ξ ( i, j ) = Expected number of transitions from S to S
  t =1
          t                                               i   j
Baum-Welch Method
• Re-estimation formulas for π, A and B
π i = γ1(i)
        T −1

        ∑ξ (i, j)
        t =1
                 t
                                     expected number of transitions from state Si to S j
aij =     T −1
                                 =
                                        expected number of transitions from state Si
         ∑γt (i)
          t =1
                     T

                  ∑t =1
                             γ t ( j)
               s.t. Ot =vk                expected number of times in state j and observing symbol vk
b j (k) =           T
                                        =
                                                     expected number of times in state j
                  ∑γ ( j)
                     t =1
                             t
Baum-Welch Method
• P(O|λ) > P(O|λ)

• Iteratively use λ in place of λ and repeat
  the re-estimation, we then can improve
  P(O| λ) until some limiting point is
  reached

More Related Content

PPT
Hmm viterbi
PDF
Hidden Markov Models
PDF
Markov Models
PDF
Aurelian Isar - Decoherence And Transition From Quantum To Classical In Open ...
PDF
Hidden Markov Models
PDF
Proceedings A Method For Finding Complete Observables In Classical Mechanics
PPTX
Second order systems
Hmm viterbi
Hidden Markov Models
Markov Models
Aurelian Isar - Decoherence And Transition From Quantum To Classical In Open ...
Hidden Markov Models
Proceedings A Method For Finding Complete Observables In Classical Mechanics
Second order systems

What's hot (20)

PPT
Rdnd2008
PDF
Research Inventy : International Journal of Engineering and Science
PDF
L 32(nkd)(et) ((ee)nptel)
PDF
PDF
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
PDF
Quantization
PDF
R. Jimenez - Fundamental Physics from Astronomical Observations
PDF
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
PDF
Nonlinear Stochastic Optimization by the Monte-Carlo Method
PDF
Signal fundamentals
PPT
Z transfrm ppt
PPT
z transforms
PPTX
Laplace transform & fourier series
PPTX
Fourier Series for Continuous Time & Discrete Time Signals
PDF
Balanced homodyne detection
PPT
Fourier analysis of signals and systems
PDF
SSA slides
PDF
Dcs lec02 - z-transform
PDF
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
Rdnd2008
Research Inventy : International Journal of Engineering and Science
L 32(nkd)(et) ((ee)nptel)
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
Quantization
R. Jimenez - Fundamental Physics from Astronomical Observations
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
Nonlinear Stochastic Optimization by the Monte-Carlo Method
Signal fundamentals
Z transfrm ppt
z transforms
Laplace transform & fourier series
Fourier Series for Continuous Time & Discrete Time Signals
Balanced homodyne detection
Fourier analysis of signals and systems
SSA slides
Dcs lec02 - z-transform
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
Ad

Viewers also liked (6)

PDF
Data Science - Part XIII - Hidden Markov Models
PDF
12 Machine Learning Supervised Hidden Markov Chains
PDF
Probabilistic Models of Time Series and Sequences
PPT
Hidden markov model ppt
PPT
Hidden Markov Model & Stock Prediction
Data Science - Part XIII - Hidden Markov Models
12 Machine Learning Supervised Hidden Markov Chains
Probabilistic Models of Time Series and Sequences
Hidden markov model ppt
Hidden Markov Model & Stock Prediction
Ad

Similar to An Introduction to Hidden Markov Model (20)

PDF
Quantum mechanics 1st edition mc intyre solutions manual
PDF
Spectral Learning Methods for Finite State Machines with Applications to Na...
PDF
Chapter 4 likelihood
PDF
Hidden markovmodel
PDF
omp-and-k-svd - Gdc2013
PPT
Cognitive radio
PDF
A new transformation into State Transition Algorithm for finding the global m...
PDF
ma112011id535
PDF
Ma2520962099
PDF
Generalized Reinforcement Learning
PPT
fghdfh
PDF
One way to see higher dimensional surface
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PDF
Machine Learning With MapReduce, K-Means, MLE
PDF
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
PDF
8 krishna mohan gonuguntla 86--93
PDF
CVPR2010: higher order models in computer vision: Part 1, 2
PDF
How to design a linear control system
PDF
D028036046
Quantum mechanics 1st edition mc intyre solutions manual
Spectral Learning Methods for Finite State Machines with Applications to Na...
Chapter 4 likelihood
Hidden markovmodel
omp-and-k-svd - Gdc2013
Cognitive radio
A new transformation into State Transition Algorithm for finding the global m...
ma112011id535
Ma2520962099
Generalized Reinforcement Learning
fghdfh
One way to see higher dimensional surface
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Machine Learning With MapReduce, K-Means, MLE
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
8 krishna mohan gonuguntla 86--93
CVPR2010: higher order models in computer vision: Part 1, 2
How to design a linear control system
D028036046

More from Shih-Hsiang Lin (13)

PDF
Introduction to Apache Ant
PDF
Introduction to GNU Make Programming Language
KEY
Ch6 file, saving states, and preferences
PDF
[C++ gui programming with qt4] chap9
PDF
Ch5 intent broadcast receivers adapters and internet
PDF
Ch4 creating user interfaces
PDF
Ch3 creating application and activities
PDF
[C++ GUI Programming with Qt4] chap7
PPTX
[C++ GUI Programming with Qt4] chap4
PPTX
Function pointer
PPT
Introduction to homography
PPTX
Git basic
PPTX
Project Hosting by Google
Introduction to Apache Ant
Introduction to GNU Make Programming Language
Ch6 file, saving states, and preferences
[C++ gui programming with qt4] chap9
Ch5 intent broadcast receivers adapters and internet
Ch4 creating user interfaces
Ch3 creating application and activities
[C++ GUI Programming with Qt4] chap7
[C++ GUI Programming with Qt4] chap4
Function pointer
Introduction to homography
Git basic
Project Hosting by Google

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Electronic commerce courselecture one. Pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
NewMind AI Monthly Chronicles - July 2025
PPT
Teaching material agriculture food technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
A Presentation on Artificial Intelligence
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Electronic commerce courselecture one. Pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
“AI and Expert System Decision Support & Business Intelligence Systems”
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
NewMind AI Monthly Chronicles - July 2025
Teaching material agriculture food technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

An Introduction to Hidden Markov Model

  • 1. An Introduction to HMM Browny 2010.07.21
  • 2. MM vs. HMM States States Observations
  • 3. Markov Model • Given 3 weather states: – {S1, S2, S3} = {rain, cloudy, sunny} Rain Cloudy Sunny Rain 0.4 0.3 0.3 Cloudy 0.2 0.6 0.2 Sunny 0.1 0.1 0.8 • What is the probabilities for next 7 days will be {sun, sun, rain, rain, sun, cloud, sun} ?
  • 4. Hidden Markov Model • The states – We don’t understand, Hidden! – But it can be indirectly observed • Example – 北極or赤道(model), Hot/Cold(state), 1/2/3 ice cream(observation)
  • 5. Hidden Markov Model • The observation is a probability function of state which is not observable directly Hidden States
  • 6. HMM Elements • N, the number of states in the model • M, the number of distinct observation symbols • A, the state transition probability distribution • B, the observation symbol probability distribution in states • π, the initial state distribution λ: model
  • 7. Example P(…|C) P(…|H) P(…|Start) P(1|…) 0.7 0.1 P(2|…) 0.2 B: 0.2 Observation P(3|…) 0.1 0.7 P(C|…) 0.8 0.1 0.5 A: π: P(H|…) 0.1 0.8 0.5 Transition initial P(STOP|…) 0.1 0.1 0
  • 9. 3 Problems 1. 觀察到的現象最符合哪一個模型 P(觀察到的現象|模型) 2. 怎樣的狀態序列最符合觀察到的現 象和已知的模型 P(狀態序列|觀察到的現象, 模型) 3. 怎樣的模型最有可能產生觀察到的 現象 what 模型 maximize P(觀察到的現象| 模型)
  • 10. Solution 1 • 已知模型,一觀察序列之產生機率 P(O|λ) R1 R1 R1 S1 S1 S1 R2 R2 R2 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t 觀察到 R1 R1 R2 的機率為多少?
  • 11. Solution 1 • 考慮一特定的狀態序列 Q = q1, q2 … qT • 產生出一特定觀察序列之機率為 P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ) = bq1(O1) * bq2(O2) * … * bqT(OT)
  • 12. Solution 1 • 此一特定序列發生之機率為 P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT • 已知模型,一觀察序列之產生機率 P(O|λ) P(O|λ) = ,q∑ q P(O|Q, λ) * P(Q| λ) q ,..., 1 2 T = q ,q∑ q πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT) ,..., 1 2 T
  • 13. Solution 1 狀態的數量) • Complexity (N: 狀態的數量 – 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換 組合數) – For N=5 states, T=100 observations, there are order 2*100*5100 ≈ 1072 computations!! • Forward Algorithm – Forward variable αt(i) (給定時間 t 時狀態為 Si 的 條件下,向前 向前局部觀察序列為O1, O2, O3…, Ot 的 向前 機率) at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )
  • 14. Solution 1 R1 R1 R1 S1 S1 S1 R2 R2 R2 When O1 = R1 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t α1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N α 2 (1) = α1 (1) a11 + α1 ( 2 ) a21 + α1 ( 3) a31  b1 ( O2 )   α1 (1) = π 1b1 (O1 ) α1 (2) = π 2b2 (O1 ) α 2 ( 2 ) = α1 (1) a12 + α1 ( 2 ) a22 + α1 ( 3) a32  b2 ( O2 )   α1 (3) = π 3b3 (O1 )
  • 15. Forward Algorithm • Initialization: α1 (i ) = π i bi (O1 ) 1 ≤ i ≤ N • Induction: N  1 ≤ t ≤ T −1 αt +1 ( j ) = ∑αt ( i ) aij  bj ( Ot +1 ) 1 ≤ j ≤ N  i=1  • Termination: N P(O | λ ) = ∑ αT (i ) i =1
  • 16. Backward Algorithm • Forward Algorithm at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ ) • Backward Algorithm – 給定時間 t 時狀態為 Si 的條件下,向後 向後局 向後 部觀察序列為 Ot+1, Ot+2, …, OT的機率 βt (i ) = P(Ot +1 , Ot + 2 ,..., OT , qt = Si | λ )
  • 17. Backward Algorithm • Initialization βT (i ) = 1 1 ≤ i ≤ N • Induction N t = T −1, T − 2, ...,1 βt (i ) = ∑ aij b j (Ot +1 ) β t +1 ( j ) j =1 1≤ i ≤ N
  • 18. Backward Algorithm R1 R1 R1 S1 S1 S1 R2 R2 R2 When OT = R1 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t N β T −1 (1) = ∑ a1 j b j ( OT ) β T ( j ) j =1 = a11b1 ( OT ) + a12 b2 ( OT ) + a13b3 ( OT )
  • 19. Solution 2 • 怎樣的狀態序列最能解釋觀察到的現 象和已知的模型 P(狀態序列|觀察到的現象, 模型) • 無精確解,有很多種方式解此問題, 對狀態序列的不同限制有不同的解法 對狀態序列的不同限制
  • 20. Solution 2 • 例: Choose the state qt which are individually most likely – γt(i) : the probability of being in state Si at time t, given the observation sequence O, and the model λ P (O | qt = Si , λ ) α t ( i ) βt ( i ) α t ( i ) βt ( i ) γ t (i ) = = = N P (O λ ) P (O λ ) ∑ α t ( i ) βt ( i ) i =1 qt = argmax γ t ( i )  1 ≤ t ≤ T   1≤i ≤ N
  • 21. Viterbi algorithm • The most widely used criteria is to find the “single best state sequence” maxmize P ( Q | O, λ ) ≈ maxmize P ( Q, O | λ ) • A formal technique exists, based on dynamic programming methods, and is called the Viterbi algorithm
  • 22. Viterbi algorithm • To find the single best state sequence, Q = {q1, q2, …, qT}, for the given observation sequence O = {O1, O2, …, OT} • δt(i): the best score (highest prob.) along a single path, at time t, which accounts for the first t observations and end in state Si δ t ( i ) = max P  q1 q2 ... qt = Si , O1 O2 ... Ot λ    1 q , q ,..., q 2 t −1
  • 23. Viterbi algorithm • Initialization - δ1(i) – When t = 1 the most probable path to a state does not sensibly exist – However we use the probability of being in that state given t = 1 and the observable state O1 δ1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N ψ (i ) = 0
  • 24. Viterbi algorithm • Calculate δt(i) when t > 1 – δt(i) : The most probable path to the state X at time t – This path to X will have to pass through one of the states A, B or C at time (t-1) Most probable path to A: δ t −1 ( A) a AX bX ( Ot )
  • 25. Viterbi algorithm • Recursion δ t ( j ) = max δ t −1 ( i ) aij  b j ( Ot )   2≤t ≤T 1≤ i ≤ N ψ t ( j ) = argmax δ t −1 ( i ) aij  1≤ j ≤ N  1≤ i ≤ N  • Termination P* = max δ T ( i )    1≤i ≤ N q = argmax δ T ( i )  * T   1≤i ≤ N
  • 26. Viterbi algorithm • Path (state sequence) backtracking qt* = ψ t +1 (qt*+1 ) t = T − 1, T − 2, ..., 1 qT −1 = ψ T (qT ) = argmax δ T −1 ( i ) aiq*  * * 1≤i ≤ N  T  ... ... * * q1 = ψ 2 (q2 )
  • 27. Solution 3 • 怎樣的模型 λ = (A, B, π) 最有可能產生 觀察到的現象 what 模型 maximize P(觀察到的現象| 模型) • There is no known analytic solution. We can choose λ = (A, B, π) such that P(O| λ) is locally maximized using an iterative procedure
  • 28. Baum-Welch Method • Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ) – The probability of being in state Si at time t, and state Sj at time t+1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) ξt ( i, j ) = P (O λ ) α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) = N N ∑∑ α ( i ) a b ( O ) β ( j ) i =1 j =1 t ij j t +1 t +1
  • 29. Baum-Welch Method • γt(i) : the probability of being in state Si at time t, given the observation sequence O, and the model λ α t ( i ) βt ( i ) α ( i ) βt ( i ) γ t (i ) = = N t P (O λ ) ∑ α t ( i ) βt ( i ) • Relate γt(i) to ξt(i, j) i =1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) N ξt ( i, j ) = γ t ( i ) = ∑ ξt ( i, j ) P (O λ ) j =1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) = N N ∑∑ α ( i ) a b ( O ) β ( j ) i =1 j =1 t ij j t +1 t +1
  • 30. Baum-Welch Method • The expected number of times that state Si is visited T −1 ∑ γ ( i ) = Expected number of transitions from Si t =1 t • Similarly, the expected number of transitions from state Si to state Sj T −1 ∑ ξ ( i, j ) = Expected number of transitions from S to S t =1 t i j
  • 31. Baum-Welch Method • Re-estimation formulas for π, A and B π i = γ1(i) T −1 ∑ξ (i, j) t =1 t expected number of transitions from state Si to S j aij = T −1 = expected number of transitions from state Si ∑γt (i) t =1 T ∑t =1 γ t ( j) s.t. Ot =vk expected number of times in state j and observing symbol vk b j (k) = T = expected number of times in state j ∑γ ( j) t =1 t
  • 32. Baum-Welch Method • P(O|λ) > P(O|λ) • Iteratively use λ in place of λ and repeat the re-estimation, we then can improve P(O| λ) until some limiting point is reached