[学习笔记] [机器学习] 12. [下] HMM 隐马尔可夫算法(马尔科夫链、HMM 三类问题、前后后向算法、维特比算法、鲍姆-韦尔奇算法、API 及实例)

5. 维特比算法解码隐藏状态序列 Q Q Q

学习目标:

  • 知道维特比算法解码隐藏状态序列 Q Q Q

在本篇我们会讨论维特比算法解码隐藏状态序列 Q Q Q,即给定模型 λ \lambda λ 和观测序列 O O O,求给定观测序列 O O O 条件下,最可能出现的对应的隐藏状态序列 Q ∗ Q^* Q

HMM 模型的解码问题最常用的算法是维特比算法,当然也有其他的算法可以求解这个问题。同时维特比算法是一个通用的求序列最短路径的动态规划算法,也可以用于很多其他问题

5.1 HMM 最可能隐藏状态序列求解概述

HMM 模型的解码问题即:

给定模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) λ=(A,B,Π) 和 观测序列 O = o 1 , o 2 , . . . , o T O = o_1,o_2, ..., o_T O=o1,o2,...,oT,求给定观测序列 O O O 条件下,最可能出现的对应的隐藏状态序列 Q ∗ = q 1 ∗ , q 2 ∗ , . . . , q T ∗ Q^*=q^*_1, q^*_2, ..., q^*_T Q=q1,q2,...qT,即 P ( Q ∗ ∣ O ) P(Q^*|O) P(QO) 的最大化。

一个可能的近似解法是求出观测序列 O O O 在每个时刻 t t t 最可能的隐藏状态 q t ∗ q^*_t qt,然后得到一个近似的隐藏状态序列 Q ∗ = q 1 ∗ , q 2 ∗ , . . . , q T ∗ Q^*=q^*_1, q^*_2, ..., q^*_T Q=q1,q2,...qT。要这样近似求解不难,利用前向后向算法评估观察序列概率的定义:

  • 在给定模型 λ \lambda λ 和观测序列 O O O , at timettt is in stateqi q_iqiThe probability of is
    γ t ( i ) \gamma_t(i)ct( i ) , this probability can be calculated by the forward algorithm and backward algorithm of HMM. This way we have:

Q t ∗ = a r g m a x i ≤ i ≤ N ( ˚ i ) t = 1 , 2 , . . . , T Q^*_t = \underset{i \le i \le N}{\mathrm{arg max}} \r(i) \quad t = 1, 2, ..., T Qt=iiNargmax(˚i)t=1,2,...,T

in:

  • A A A : State transition matrix, whereaij a_{ij}aijmeans from the hidden state iii transfers to the hidden statejjj 的概率。
  • B B B:观测概率矩阵,其中 b j ( k ) b_j(k) bj(k) 表示在隐藏状态 j j j 下观测到符号 k k k 的概率。
  • Π \Pi Π:初始状态概率向量,其中 π i \pi_i πi 表示初始时刻隐藏状态为 i i i 的概率。
  • O O O:观测序列,其中 o t o_t ot 表示时刻 t t t 的观测值。
  • λ \lambda λ:HMM 模型参数,包括状态转移矩阵 A A A、观测概率矩阵 B B B 和初始状态概率向量 Π \Pi Π
  • Q ∗ Q^* Q:最可能的隐藏状态序列(即预测的隐藏状态序列),其中 q t ∗ q^*_t qt 表示时刻 t t t 最可能的隐藏状态。
  • γ t ( i ) \gamma_t(i) γt(i):在给定模型 λ \lambda λ 和观测序列 O O O 时,在时刻 t t t 处于状态 q i q_i qi 的概率。

近似算法很简单,但是却不能保证预测的状态序列 Q ∗ Q^* Q 整体是最可能的状态序列 Q best Q_{\text{best}} Qbest,因为预测的状态序列 Q ∗ Q^* Q 中某些相邻的隐藏状态 q q q 可能存在转移概率 a i j a_{ij} aij 为 0 的情况。

维特比算法可以将 HMM 的状态序列作为一个整体来考虑,避免近似算法的问题,下面我们来看看维特比算法进行 HMM 解码的方法。

5.2 维特比算法概述

The Viterbi algorithm is a general decoding algorithm, which is a method for finding the shortest path of a sequence based on dynamic programming. Since it is a dynamic programming algorithm, it is necessary to find a suitable local state and a recursive formula for the local state.

In HMM, the Viterbi algorithm defines two local states for recursion:

[ The first local state δ t ( q ) \delta_t(q)dt( q ) ]The first local state is at timettt hidden state isqqq all possible state transition pathsq 1 , q 2 , . . . , qt q_1,q_2, ..., q_tq1,q2,...,qtThe probability maximum value in , denoted as δ t ( q ) \delta_t(q)dt(q)

δ t ( q ) = max ⁡ q 1 , q 2 , . . . , q t − 1 P ( q t = q , q 1 , q 2 , . . . , q t − 1 , o t , o t − 1 , . . . , o 1 ∣ λ ) i = 1 , 2 , . . . , N \delta_t(q) = \underset{q_1, q_2, ..., q_{t-1}}{\max}P(q_t = q, q_1, q_2, ..., q_{t-1}, o_t, o_{t-1}, ..., o_1|\lambda) \quad i = 1, 2, ..., N δt(q)=q1,q2,...,qt1maxP(qt=q,q1,q2,...,qt1,ot,ot1,...,o1λ)i=1,2,...,N

δ t ( q ) \delta_t(q) δt(q) 的定义可以得到 δ \delta The recursive expression of δ :

δ t + 1 ( q ) = max ⁡ q 1 , q 2 , . . . , q t P ( q t + 1 = q , q 1 , q 2 , . . . , q t , o t + 1 , o t , . . . , o 1 ∣ λ ) = max ⁡ 1 ≤ j ≤ N [ δ t ( j ) a j i ] b i ( o t + 1 ) \begin{aligned} \delta_{t+1}(q) & = \underset{q_1, q_2, ...,q_t}{\max}P(q_{t+1} = q, q_1, q_2, ..., q_t, o_{t+1}, o_t, ..., o_1 | \lambda)\\ &= \underset{1\le j \le N}{\max}[\delta_t(j)a_{ji}]b_{i}(o_{t+1}) \end{aligned} dt+1(q)=q1,q2,...,qtmaxP(qt+1=q,q1,q2,...,qt,ot+1,ot,...,o1λ)=1jNmax[ dt( j ) hasji]bi(ot+1)

[ The second local state ψ t ( q ) \psi_t(q)pt( q ) ]The second local state is recursively obtained from the first local state.

We define at time ttt hidden state isqqAll single state transition paths of q ( q 1 , q 2 , . . . , qt − 1 ) (q_1,q_2, ..., q_{t-1})(q1,q2,...,qt1) in the transition path with the highest probabilityt − 1 t-1tThe hidden state of 1 node isψ t ( q ) \psi_t(q)pt( q ) . Its recursive expression can be expressed as:

ψ t ( q ) = argmax 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] \psi_t(q) = \underset{1 \le j \le N}{\text{argmax}}[\delta_{t-1}(j)a_{ji}] pt(q)=1jNargmax[ dt1( j ) hasji]

With these two partial states, we can recurse from time 0 to time TTT , then useψ t ( i ) \psi_t(i)pt( i ) The recorded previous most likely state nodes backtrack until an optimal sequence of hidden states is found.

in:

  • A A A:状态转移矩阵,其中 a i j a_{ij} aij 表示从隐藏状态 i i i 转移到隐藏状态 j j j 的概率。
  • B B B:观测概率矩阵,其中 b j ( k ) b_j(k) bj(k) 表示在隐藏状态 j j j 下观测到符号 k k k 的概率。
  • Π \Pi Π:初始状态概率向量,其中 π i \pi_i πi 表示初始时刻隐藏状态为 i i i 的概率。
  • O O O:观测序列,其中 o t o_t ot 表示时刻 t t t 的观测值。
  • λ \lambda λ:HMM 模型参数,包括状态转移矩阵 A A A、观测概率矩阵 B B B and initial state probability vectorΠ \PiP. _
  • δ t ( q ) \delta_t(q)dt( q ) : at timettt hidden state isqqThe probability maximum among all possible state transition paths of q .
  • ψ t ( q ) \psi_t(q)pt( q ) : at timettt hidden state isqqAmong all single state transition paths of q , the t − 1 t-1transition path with the highest probabilitytHidden state of 1 node.

The Viterbi algorithm solves the optimal state sequence through dynamic programming, which has high efficiency and accuracy.

5.3 Summary of Viterbi Algorithm Process

  • Form :HMM λ = ( A , B , ∏ ) \lambda = ( A , B , \ prod )l=(A,B,) , viewing orderO = ( o 1 , o 2 , . . . , o T ) O = (o_1, o_2, ..., o_T)O=(o1,o2,...,oT)
  • Output : Most likely sequence of hidden states Q ∗ = q 1 ∗ , q 1 ∗ , . . . , q T ∗ Q^* = q^*_1, q^*_1,..., q^*_TQ=q1,q1,...,qT

The process is as follows :

Step 1 : Initialize local state

δ 1 ( q ) = Π i b i ( o 1 ) i = 1 , 2 , . . . , N ψ 1 ( q ) = 0 i = 1 , 2 , . . . , N \begin{aligned} & \delta_1(q) = \Pi_i b_i (o_1) \quad i = 1, 2, ..., N\\ & \psi_1(q) = 0 \quad i = 1, 2, ..., N \end{aligned} d1(q)=Piibi(o1)i=1,2,...,Np1(q)=0i=1,2,...,N

Step 2 : Carry out dynamic programming recursion time t = 2 , 3 , . . . , T t = 2, 3,..., Tt=2,3,...,The local state of T

δ t ( q ) = max ⁡ 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] b i ( o t ) i = 1 , 2 , . . . , N ψ t ( i ) = argmax 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] i = 1 , 2 , . . . , N \begin{aligned} & \delta_t(q) = \underset{1 \le j \le N}{\max}[\delta_{t-1}(j)a_{ji}]b_i(o_t) \quad & i = 1, 2, ..., N\\ & \psi_t(i) = \underset{1 \le j \le N}{\text{argmax}}[\delta_{t-1}(j)a_{ji}] \quad & i = 1, 2, ..., N \end{aligned} dt(q)=1jNmax[ dt1( j ) hasji]bi(ot)pt(i)=1jNargmax[ dt1( j ) hasji]i=1,2,...,Ni=1,2,...,N

步骤三:计算时刻 T T T 最大的 δ T ( i ) \delta_T(i) δT(i),即为最可能隐藏状态序列出现的概率。计算时刻 T T T 最大的 ψ t ( q ) \psi_t(q) ψt(q),即为时刻 T T T 最可能的隐藏状态。

P ∗ = max ⁡ 1 ≤ j ≤ N δ T ( i ) q T ∗ = argmax 1 ≤ j ≤ N [ δ T ( q ) ] \begin{aligned} & P^* = \underset{1 \le j \le N}{\max} \delta_T(i)\\ & q^*_T = \underset{1 \le j \le N}{\text{argmax}}[\delta_T(q)] \end{aligned} P=1jNmaxδT(i)qT=1jNargmax[δT(q)]

步骤四:利用局部状态 ψ t ( i ) \psi_t(i) ψt(i) 开始回溯。对于 t = T − 1 , T − 2 , . . . , 1 t = T - 1, T- 2, ..., 1 t=T1,T2,...,1

q t ∗ = ψ t + 1 ( q t + 1 ∗ ) q^*_t = \psi_{t+1}(q^*_{t+1}) qt=ψt+1(qt+1)

最终得到最有可能的隐藏状态序列 Q ∗ = q 1 ∗ , q 2 ∗ , . . . , q T ∗ Q^* = q^*_1, q^*_2, ..., q^*_T Q=q1,q2,...,qT

其中:

  • A A A : State transition matrix, whereaij a_{ij}aijmeans from the hidden state iii transfers to the hidden statejjThe probability of j .
  • B B B : Observation probability matrix, wherebj ( k ) b_j(k)bj( k ) means that in the hidden statejjThe symbol kkis observed under jprobability of k .
  • Π \PiΠ : initial state probability vector, whereπ i \pi_iPiiIndicates that the hidden state at the initial moment is iiprobability of i .
  • OOO : sequence of observations, whereot o_totrepresents time ttThe observed value of t .
  • λ \lambdaλ : HMM model parameters, including the state transition matrixAAA. Observation probability matrixBBB and initial state probability vectorΠ \PiP. _
  • δ t ( q ) \delta_t(q)dt( q ) : at timettt hidden state isqqThe probability maximum among all possible state transition paths of q .
  • ψ t ( q ) \psi_t(q)pt( q ) : at timettt hidden state isqqAmong all single state transition paths of q , the t − 1 t-1transition path with the highest probabilitytHidden state of 1 node.

5.4 Solution example of HMM Viterbi algorithm

Below we still use the example of boxes and balls to look at the solution of the HMM Viterbi algorithm. Our observation set is:

V = {red,white} M = 2 \begin{aligned} & V = \{red,white\}\\ & M = 2 \end{aligned}V={ red ,White }M=2

Our state collection is:

Q = { Box 1 , Box 2 , Box 3 } N = 3 \begin{aligned} & Q = \{Box 1, Box 2, Box 3\}\\ & N = 3 \end{aligned}Q={ box1 , _盒子2,盒子3}N=3

而观察序列 O O O 和状态序列 i i i 的长度为都为 3。

初始状态分布为:

Π = ( 0.2 , 0.4 , 0.4 ) T \Pi = (0.2, 0.4, 0.4)^T Π=(0.2,0.4,0.4)T

状态转移概率分布矩阵 A A A(不可见的,隐含的)为:

A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] N × N = 3 × 3 A = \begin{bmatrix} 0.5 & 0.2 & 0.3\\ 0.3 & 0.5 & 0.2\\ 0.2 & 0.3 & 0.5 \end{bmatrix}_{N \times N = 3 \times 3} A= 0.50.30.20.20.50.30.30.20.5 N×N=3×3

行表示第几次抽球(从2开始);列表示使用第几个盒子的概率

观测状态概率矩阵 B B B(可见的)为:

B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] N × M = 3 × 2 B = \begin{bmatrix} 0.5 & 0.5\\ 0.4 & 0.6\\ 0.7 & 0.3 \end{bmatrix}_{N \times M = 3 \times 2} B= 0.50.40.70.50.60.3 N×M=3×2

行代表第几个盒子;列1代表红球的概率,列2代表白球的概率

球的颜色的观测序列:

O = { 红 , 白 , 红 } O = \{红, 白, 红\} O={ ,,}


按照我们前面的维特比算法,首先需要得到三个隐藏状态在 时刻1 时对应的各自两个局部状态,此时观测状态为 1:

δ 1 ( 1 ) = Π 1 b 1 ( o 1 ) = 0.2 first box × 0.5 red ball = 0.1 δ 1 ( 2 ) = Π 2 b 2 ( o 1 ) = 0.4 second box × 0.4 red ball = 0.16 δ 1 ( 3 ) = Π 3 b 3 ( o 1 ) = 0.4 third box × 0.7 red ball = 0.16 ψ 1 ( 1 ) = ψ 1 ( 2 ) = ψ 1 ( 3 ) = 0 \begin{ aligned} & \delta_1(1) = \Pi_1b_1(o_1) = \underset{first box}{0.2} \times \underset{red ball}{0.5} = 0.1\\ & \delta_1(2) = \Pi_2b_2 (o_1) = \underset{Second box}{0.4} \times \underset{Red ball}{0.4} = 0.16\\ & \delta_1(3) = \Pi_3b_3(o_1) = \underset{Third box }{0.4} \times \underset{red ball}{0.7} = 0.16\\ & \psi_1(1) = \psi_1(2) = \psi_1(3) = 0 \end{aligned}d1(1)=Pi1b1(o1)=first box0.2×red ball0.5=0.1d1(2)=Pi2b2(o1)=second box0.4×red ball0.4=0.16d1(3)=Pi3b3(o1)=第三个盒子0.4×红球0.7=0.16ψ1(1)=ψ1(2)=ψ1(3)=0

ψ 1 ( 1 ) = ψ 1 ( 2 ) = ψ 1 ( 3 ) = 0 \psi_1(1) = \psi_1(2) = \psi_1(3) = 0 ψ1(1)=ψ1(2)=ψ1(3)=0 是因为初始化设定它们为 0

现在开始递推三个隐藏状态在 时刻2 时对应的各自两个局部状态,此时观测状态为 2:

δ 2 ( 1 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.1 上一次是盒子 1 × 0.15 盒子 1 → 盒子 1 ‾ 第一种情况 , 0.16 上一次是盒子 2 × 0.3 盒子 2 → 盒子 1 ‾ 第二种情况 , 0.28 上一次是盒子 3 × 0.2 盒子 3 → 盒子 1 ‾ 第三种情况 ] × 0.5 白球 = 0.028 ψ 2 ( 1 ) = 3 最大值对应的索引 ( 从 1 开始 ) δ 2 ( 2 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 2 ] b 2 ( o 2 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.1 上一次是盒子 1 × 0.2 盒子 1 → 盒子 2 ‾ 第一种情况 , 0.16 上一次是盒子 2 × 0.5 盒子 2 → 盒子 2 ‾ 第二种情况 , 0.28 上一次是盒子 3 × 0.3 盒子 3 → 盒子 2 ‾ 第三种情况 ] × 0.6 白球 = 0.0504 ψ 2 ( 2 ) = 3 最大值对应的索引 ( 从 1 开始 ) δ 2 ( 3 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 3 ] b 3 ( o 2 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.1 上一次是盒子 1 × 0.3 盒子 1 → 盒子 3 ‾ 第一种情况 , 0.16 上一次是盒子 2 × 0.2 盒子 2 → 盒子 3 ‾ 第二种情况 , 0.28 上一次是盒子 3 × 0.5 盒子 3 → 盒子 3 ‾ 第三种情况 ] × 0.3 白球 = 0.042 ψ 2 ( 3 ) = 3 最大值对应的索引 ( 从 1 开始 ) \begin{aligned} & \delta_2(1) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j1}]b_1(o_2) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.1} \times \underset{盒子1\rightarrow盒子1}{0.15}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.16} \times \underset{盒子2\rightarrow盒子1}{0.3}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.28} \times \underset{盒子3\rightarrow盒子1}{0.2}}}] \times \underset{白球}{0.5} = 0.028\\ & \qquad \psi_2(1) = \underset{最大值对应的索引(从1开始)}{3}\\ & \delta_2(2) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j2}]b_2(o_2) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.1} \times \underset{盒子1\rightarrow盒子2}{0.2}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.16} \times \underset{盒子2\rightarrow盒子2}{0.5}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.28} \times \underset{盒子3\rightarrow盒子2}{0.3}}}] \times \underset{白球}{0.6} = 0.0504\\ & \qquad \psi_2(2) = \underset{最大值对应的索引(从1开始)}{3}\\ & \delta_2(3) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j3}]b_3(o_2) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.1} \times \underset{盒子1\rightarrow盒子3}{0.3}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.16} \times \underset{盒子2\rightarrow盒子3}{0.2}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.28} \times \underset{盒子3\rightarrow盒子3}{0.5}}}] \times \underset{白球}{0.3} = 0.042\\ & \qquad \psi_2(3) = \underset{最大值对应的索引(从1开始)}{3}\\ \end{aligned} d2(1)=1j3max[ d1( j ) hasj 1]b1(o2)=1j3max[the first caselast time box 10.1×Box 1 Box 10.15,the second caselast time box 20.16×Box 2 Box 10.3,third caselast time box 30.28×Box 3 Box 10.2]×White ball0.5=0.028p2(1)=The index corresponding to the maximum value ( starting from 1 )3d2(2)=1j3max[ d1( j ) hasj 2]b2(o2)=1j3max[the first caselast time box 10.1×Box 1 Box 20.2,the second caselast time box 20.16×Box 2 Box 20.5,third caselast time box 30.28×Box 3 Box 20.3]×White ball0.6=0.0504p2(2)=The index corresponding to the maximum value ( starting from 1 )3d2(3)=1j3max[ d1( j ) hasj3 _]b3(o2)=1j3max[the first caselast time box 10.1×Box 1 Box 30.3,the second caselast time box 20.16×Box 2 Box 30.2,third caselast time box 30.28×盒子3盒子30.5]×白球0.3=0.042ψ2(3)=最大值对应的索引(1开始)3

继续递推三个隐藏状态在 时刻3 时对应的各自两个局部状态,此时观测状态为 1:

δ 3 ( 1 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 3 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.028 上一次是盒子 1 × 0.5 盒子 1 → 盒子 1 ‾ 第一种情况 , 0.0504 上一次是盒子 2 × 0.3 盒子 2 → 盒子 1 ‾ 第二种情况 , 0.042 上一次是盒子 3 × 0.2 盒子 3 → 盒子 1 ‾ 第三种情况 ] × 0.5 红球 = 0.00756 ψ 3 ( 1 ) = 2 最大值对应的索引 ( 从 1 开始 ) δ 3 ( 2 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 2 ] b 2 ( o 3 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.028 上一次是盒子 1 × 0.2 盒子 1 → 盒子 2 ‾ 第一种情况 , 0.0504 上一次是盒子 2 × 0.5 盒子 2 → 盒子 2 ‾ 第二种情况 , 0.042 上一次是盒子 3 × 0.3 盒子 3 → 盒子 2 ‾ 第三种情况 ] × 0.4 红球 = 0.0504 ψ 3 ( 2 ) = 2 最大值对应的索引 ( 从 1 开始 ) δ 3 ( 3 ) = max ⁡ 1 ≤ j ≤ 3 [ δ 1 ( j ) a j 3 ] b 3 ( o 3 ) = max ⁡ 1 ≤ j ≤ 3 [ 0.028 上一次是盒子 1 × 0.3 盒子 1 → 盒子 3 ‾ 第一种情况 , 0.0504 上一次是盒子 2 × 0.2 盒子 2 → 盒子 3 ‾ 第二种情况 , 0.042 上一次是盒子 3 × 0.5 盒子 3 → 盒子 3 ‾ 第三种情况 ] × 0.7 红球 = 0.042 ψ 3 ( 3 ) = 3 最大值对应的索引 ( 从 1 开始 ) \begin{aligned} & \delta_3(1) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j1}]b_1(o_3) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.028} \times \underset{盒子1\rightarrow盒子1}{0.5}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.0504} \times \underset{盒子2\rightarrow盒子1}{0.3}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.042} \times \underset{盒子3\rightarrow盒子1}{0.2}}}] \times \underset{红球}{0.5} = 0.00756\\ & \qquad \psi_3(1) = \underset{最大值对应的索引(从1开始)}{2}\\ & \delta_3(2) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j2}]b_2(o_3) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.028} \times \underset{盒子1\rightarrow盒子2}{0.2}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.0504} \times \underset{盒子2\rightarrow盒子2}{0.5}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.042} \times \underset{盒子3\rightarrow盒子2}{0.3}}}] \times \underset{红球}{0.4} = 0.0504\\ & \qquad \psi_3(2) = \underset{最大值对应的索引(从1开始)}{2}\\ & \delta_3(3) = \underset{1 \le j \le 3}{\max}[\delta_1(j)a_{j3}]b_3(o_3) = \underset{1 \le j \le 3}{\max}[\underset{第一种情况}{\underline{\underset{上一次是盒子1}{0.028} \times \underset{盒子1\rightarrow盒子3}{0.3}}}, \underset{第二种情况}{\underline{\underset{上一次是盒子2}{0.0504} \times \underset{盒子2\rightarrow盒子3}{0.2}}}, \underset{第三种情况}{\underline{\underset{上一次是盒子3}{0.042} \times \underset{盒子3\rightarrow盒子3}{0.5}}}] \times \underset{红球}{0.7} = 0.042\\ & \qquad \psi_3(3) = \underset{最大值对应的索引(从1开始)}{3}\\ \end{aligned} d3(1)=1j3max[ d1( j ) hasj 1]b1(o3)=1j3max[the first caselast time box 10.028×Box 1 Box 10.5,the second caselast time box 20.0504×Box 2 Box 10.3,third caselast time box 30.042×Box 3 Box 10.2]×red ball0.5=0.00756p3(1)=The index corresponding to the maximum value ( starting from 1 )2d3(2)=1j3max[ d1( j ) hasj 2]b2(o3)=1j3max[the first caselast time box 10.028×Box 1 Box 20.2,the second caselast time box 20.0504×Box 2 Box 20.5,third caselast time box 30.042×盒子3盒子20.3]×红球0.4=0.0504ψ3(2)=最大值对应的索引(1开始)2δ3(3)=1j3max[δ1(j)aj3]b3(o3)=1j3max[第一种情况上一次是盒子10.028×盒子1盒子30.3,第二种情况上一次是盒子20.0504×盒子2盒子30.2,第三种情况上一次是盒子30.042×盒子3盒子30.5]×红球0.7=0.042ψ3(3)=最大值对应的索引(1开始)3

维特比算法是一种常用的 HMM 解码算法,它基于动态规划来求解最优状态序列。维特比算法定义了两个局部状态 δ t ( q ) \delta_t(q) δt(q) ψ t ( q ) \psi_t(q) ψt(q) 来进行递推。其中, δ t ( q ) \delta_t(q) δt(q) 表示在时刻 t t t 隐藏状态为 q q q 的所有可能的状态转移路径中的概率最大值; ψ t ( q ) \psi_t(q) ψt(q) 表示在时刻 t t t 隐藏状态为 q q Among all single state transition paths of q , the t − 1 t-1transition path with the highest probabilitytHidden state of 1 node.

In the above example, the maximum probability at the last moment is δ 3 ( 3 ) \delta_3(3)d3( 3 ) , which means that at time 3, the hidden state has the highest probability of being 3. Therefore, we can getq 3 ∗ = 3 q^*_3 = 3q3=3 , that is, the most likely hidden state at time 3 is 3.

Next, we can use the local state ψ t ( i ) \psi_t(i)pt( i ) to backtrack to obtain the optimal state sequence. Sinceψ 3 ​​( 3 ) = 3 \psi_3(3)=3p3(3)=3 , soq 2 ∗ = 3 q^*_2 = 3q2=3 ; Sinceψ 2 ( 3 ) = 3 \psi_2(3)=3p2(3)=3 , soq 1 ∗ = 3 q^*_1 = 3q1=3 . Therefore, we get the final optimal state sequence as{ 3 , 3 , 3 } \{3, 3, 3\}{ 3,3,3}

The Viterbi algorithm still draws on the idea of ​​​​dynamic programming


Summary :

  • Form :HMM λ = ( A , B , ∏ ) \lambda = ( A , B , \ prod )l=(A,B,) , viewing orderO = ( o 1 , o 2 , . . . , o T ) O = (o_1, o_2, ..., o_T)O=(o1,o2,...,oT)
  • Output : Most likely sequence of hidden states Q ∗ = q 1 ∗ , q 1 ∗ , . . . , q T ∗ Q^* = q^*_1, q^*_1,..., q^*_TQ=q1,q1,...,qT

The process is as follows :

Step 1 : Initialize local state

δ 1 ( q ) = Π i b i ( o 1 ) i = 1 , 2 , . . . , N ψ 1 ( q ) = 0 i = 1 , 2 , . . . , N \begin{aligned} & \delta_1(q) = \Pi_i b_i (o_1) \quad i = 1, 2, ..., N\\ & \psi_1(q) = 0 \quad i = 1, 2, ..., N \end{aligned} δ1(q)=Πibi(o1)i=1,2,...,Nψ1(q)=0i=1,2,...,N

步骤二:进行动态规划递推时刻 t = 2 , 3 , . . . , T t = 2, 3,..., T t=2,3,...,T 的局部状态

δ t ( q ) = max ⁡ 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] b i ( o t ) i = 1 , 2 , . . . , N ψ t ( i ) = argmax 1 ≤ j ≤ N [ δ t − 1 ( j ) a j i ] i = 1 , 2 , . . . , N \begin{aligned} & \delta_t(q) = \underset{1 \le j \le N}{\max}[\delta_{t-1}(j)a_{ji}]b_i(o_t) \quad & i = 1, 2, ..., N\\ & \psi_t(i) = \underset{1 \le j \le N}{\text{argmax}}[\delta_{t-1}(j)a_{ji}] \quad & i = 1, 2, ..., N \end{aligned} dt(q)=1jNmax[ dt1( j ) hasji]bi(ot)pt(i)=1jNargmax[ dt1( j ) hasji]i=1,2,...,Ni=1,2,...,N

步骤三:计算时刻 T T T 最大的 δ T ( i ) \delta_T(i) δT(i),即为最可能隐藏状态序列出现的概率。计算时刻 T T T 最大的 ψ t ( q ) \psi_t(q) ψt(q),即为时刻 T T T 最可能的隐藏状态。

P ∗ = max ⁡ 1 ≤ j ≤ N δ T ( i ) q T ∗ = argmax 1 ≤ j ≤ N [ δ T ( q ) ] \begin{aligned} & P^* = \underset{1 \le j \le N}{\max} \delta_T(i)\\ & q^*_T = \underset{1 \le j \le N}{\text{argmax}}[\delta_T(q)] \end{aligned} P=1jNmaxδT(i)qT=1jNargmax[ dT(q)]

Step 4 : Use the local state ψ t ( i ) \psi_t(i)pt( i ) Start backtracking. Fort = T − 1 , T − 2 , . . . , 1 t = T - 1, T - 2, ..., 1t=T1,T2,...,1

qt ∗ = ψ t + 1 ( qt + 1 ∗ ) q^*_t = \psi_{t+1}(q^*_{t+1})qt=pt+1(qt+1)

Finally, the most likely hidden state sequence Q ∗ = q 1 ∗ , q 2 ∗ , . . . , q T ∗ Q^* = q^*_1, q^*_2, ..., q^*_TQ=q1,q2,...,qT

in:

  • A A A:状态转移矩阵,其中 a i j a_{ij} aij 表示从隐藏状态 i i i 转移到隐藏状态 j j j 的概率。
  • B B B:观测概率矩阵,其中 b j ( k ) b_j(k) bj(k) 表示在隐藏状态 j j j 下观测到符号 k k k 的概率。
  • Π \Pi Π:初始状态概率向量,其中 π i \pi_i πi 表示初始时刻隐藏状态为 i i i 的概率。
  • O O O:观测序列,其中 o t o_t ot 表示时刻 t t t 的观测值。
  • λ \lambda λ:HMM 模型参数,包括状态转移矩阵 A A A、观测概率矩阵 B B B and initial state probability vectorΠ \PiP. _
  • δ t ( q ) \delta_t(q)dt( q ) : at timettt hidden state isqqThe probability maximum among all possible state transition paths of q .
  • ψ t ( q ) \psi_t(q)pt( q ) : at timettt hidden state isqqAmong all single state transition paths of q , the t − 1 t-1transition path with the highest probabilitytHidden state of 1 node.

6. Introduction to Baum-Welch Algorithm

learning target:

  • Understanding the Baum-Welch Algorithm

6.1 Question introduction

Model parameter learning problem - Baum-Welch (Baum-Welch) algorithm (unknown state), that is, a given observation sequence O = { o 1 , o 2 , . . . , o T } O = \{o_1, o_2,..., o_T\}O={ o1,o2,...,oT} ,unlockλ = ( A , B , Π ) \lambda = ( A , B , \ Pi )l=(A,B,Π ) , so that the conditional probability of the observation sequence under the modelP ( O ∣ A ) P(O|A)P ( O A ) max.

The most commonly used solution is the Baum-Welch algorithm, which is actually based on the EM algorithm, but in the era when the Baum-Welch algorithm appeared, the EM algorithm has not been abstracted, so it is called Baum - Welch algorithm.

insert image description here

6.2 Principle of Baum-Welch Algorithm

Since the principle of the Baum-Welch algorithm is the principle of the EM algorithm, then we need to find the joint distribution P ( O , I ∣ λ ) P(O, I | \lambda) in step EP(O,I λ ) based on the conditional probabilityP ( I ∣ O , λ ‾ ) P(I|O,\overline{\lambda})P(IO,l) , whereλ ‾ \overline{\lambda}lis the current model parameter; then maximize this expectation in the M step to get the updated model parameter λ \lambdal .


Let’s take a look at step E first, the current model parameter is λ ‾ \overline{\lambda}l, joint distribution P ( O , I ∣ λ ) P(O,I|\lambda)P(O,I λ ) based on the conditional probability P( I ∣ O , λ ‾ ) (I|O, \overline{\lambda})(IO,l) expected expression is:

L ( λ , λ ‾ ) = ∑ IP ( I ∣ O , λ ‾ ) log ⁡ P ( O , I ∣ λ ) L(\lambda, \overline{\lambda}) = \sum_I P(I|O, \ overline{\lambda})\log P(O, I |\lambda)L ( λ ,l)=IP(IO,l)logP(O,Iλ)

In step M, we maximize the above formula, and then get the updated model parameters as follows:

λ ‾ = argmax λ ∑ IP ( I ∣ O , λ ‾ ) log ⁡ P ( O , I ∣ λ ) \overline{\lambda} = \underset{\lambda}{\text{argmax}}\sum_IP(I| O, \overline{\lambda})\log{P(O, I|\lambda)}l=largmaxIP(IO,l)logP(O,Iλ)

Iterate through steps E and M until λ ‾ \overline{\lambda}lconvergence.

7. HMM model API introduction

learning target:

  • Guide the usage of HMM model API

7.1 Installation of API

Official website link: https://hmmlearn.readthedocs.io/en/latest/

pip install hmmlearn==0.2.5

7.2 Introduction to hmmlearn

hmmlearn implements three HMM model classes, which can be divided into two types according to whether the observation state is a continuous state or a discrete state.

  • GaussianHMM and GMMHMM are HMM models for continuous observation states
  • MultinomialHMM is a model of discrete observation states, and it is also the model we use in the HMM principle series
  • GaussianHMM: Gaussian Hidden Markov Model
  • GMMHMM: Gaussian Mixture Hidden Markov Model
  • MultinomialHMM: Multinomial Hidden Markov Model

Here we mainly introduce the MultinomialHMM model about the discrete state that we have been talking about before. For the model of MultinomialHMM, it is relatively simple to use.

from hmmlearn import hmm


model = hmm.MultinomialHMM (n_components=1, startprob_prior=1.0,
                            algorithm='viterbi', random_state=None,
                            n_iter=10, tol=0.01, verbose=False,
                            params='ste', init_params='ste')
  • Role : hmm.MultinomialHMM()It is a class in hmmlearnthe library , which is used to create a hidden Markov model with polynomial (discrete) emission.
  • Parameters :
    • n_components: (int) number of hidden states
    • n_iter: (int, optional) the maximum number of loops (iterations) during training
    • tol: (float, optional) Convergence threshold. If the gain of the log-likelihood falls below this value, EM will stop.
    • verbose: (bool, optional) When assigned to True, the probability (score) of each iteration and the probability of this iteration will be output to the standard output
    • init_params: (string, optional) Determines which parameters will be initialized during training.
      • 's' means startprob: the parameter corresponds to our hidden state initial distribution Π\PiPi
      • 't' means transmat: corresponding to our state transition matrix AAA
      • 'e' means emissionprob: corresponding to our observation state probability matrix BBB
      • An empty string "" represents all training with user-supplied parameters
  • method :
    • fit()
    • decode()
    • score()
    • wait

7.3 Multinomial HMM example

Let's run through the MultinomialHMM with the example we talked about earlier about the ball.

import numpy as np
from hmmlearn import hmm
# 设定隐藏状态的集合 
states = ["box 1", "box 2", "box 3"]
n_states = len(states)

# 设定观察状态的集合
observations = ["red", "white"]
n_observations = len(observations)

# 设定初始状态分布
start_probability = np.array([0.2, 0.4, 0.4])

# 设定状态转移概率分布矩阵
transition_probability = np.array([[0.5, 0.2, 0.3],
                                   [0.3, 0.5, 0.2],
                                   [0.2, 0.3, 0.5]])

# 设定观测状态概率矩阵
emission_probability = np.array([[0.5, 0.5],
                                 [0.4, 0.6],
                                 [0.7, 0.3]])
# 定义模型
model = hmm.MultinomialHMM(n_components=n_states)

# 设定模型参数
model.startprob_ = start_probability  # 初始化状态分布
model.transmat_ = transition_probability  # 初始化状态转移概率分布矩阵
model.emissionprob_ = emission_probability  # 初始化观测状态概率矩阵

Now let's run HMM question 3: the decoding process of the Viterbi algorithm, using the same observation sequence as before to decode, the code is as follows:

seen = np.array([[0, 1, 0]]).T  # 设定观测序列(红白红)

# 维特比模型训练
box = model.predict(seen)

print("球的观测顺序为:", ' → '.join(map(lambda x: observations[x], seen.flatten())))
# 注意:需要使用flatten方法,把seen从二维变成一维
print("最可能的隐藏状态序列为:", ' → '.join(map(lambda x: states[x], box)))

Use mapthe function to seenmap the elements of the array observationsto the elements in
Use mapthe function to boxmap the elements of the array to the elements statesin

球的观测顺序为: red → white → red
最可能的隐藏状态序列为: box 3 → box 2 → box 2

Let's take a look at the problem of finding the probability of the observation sequence of HMM problem 1. The code is as follows:

prob = model.score(seen)

print(f"观测序列出现的概率为:{
      
      prob}")
观测序列出现的概率为:-2.038545309915233

It should be noted that scorethe function returns the logarithmic probability value based on the natural logarithm. The result of our manual calculation in HMM problem 1 is that the original probability without taking the logarithm is 0.13022. Compare:

import math

prob_true = math.exp(prob)
print(f"观测序列出现的概率为:{
      
      prob_true * 100:.3f}%")
观测序列出现的概率为:13.022%

Guess you like

Origin blog.csdn.net/weixin_44878336/article/details/131237767