隐马尔可夫模型(HMM模型学习、概率计算、解码)

通信模型

发送者(人或机器)发送信息时,需通过媒介(空气或电线)传播信号,此过程为广义上的编码。接收者根据规则将信号还原成发送者发送的信息,此过程为广义上的解码.

语音识别是接收方根据接收信号还原发送方的信息的过程,如何通过观测信号 o 1 , o 2 , o_1,o_2,\cdots ,来分析信号源发送的信息 s 1 , s 2 , s_1,s_2,\cdots 呢?从概率角度来看,就是从所有源信息中找到最可能产出观测信号的源信息。

根据贝叶斯定理
P ( s 1 , s 2 , o 1 , o 2 , ) = P ( o 1 , o 2 , s 1 , s 2 , ) P ( s 1 , s 2 , ) P ( o 1 , o 2 , ) P(s_1,s_2,\cdots|o_1,o_2,\cdots)=\frac{P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)}{P(o_1,o_2,\cdots)}

一旦信息 o 1 , o 2 , o_1,o_2,\cdots 产生后就不会改变,即 P ( o 1 , o 2 , ) P(o_1,o_2,\cdots) 为常数,最可能的源信息
s 1 , s 2 , = arg max s 1 , s 2 , P ( s 1 , s 2 , o 1 , o 2 , ) = arg max s 1 , s 2 , P ( o 1 , o 2 , s 1 , s 2 , ) P ( s 1 , s 2 , ) s_1,s_2,\cdots =\arg\max_{s_1,s_2,\cdots}P(s_1,s_2,\cdots|o_1,o_2,\cdots)= \arg\max_{s_1,s_2,\cdots}P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)

这个公式可由隐含马尔可夫模型求解。

马尔可夫假设和马尔可夫过程

观测序列 s 1 , s 2 , , s t , s_1,s_2,\cdots,s_t,\cdots 是每天最高气温序列, s t s_t 为气温随机变量。假设随机过程中状态 s t s_t 的概率分布只与它的前一个状态相关(今天的最高气温仅与昨天的最高气温有关),即
P ( s t s 1 , s 2 , , s t 1 ) = P ( s t s t 1 ) P(s_t|s_1,s_2,\cdots,s_{t-1})=P(s_t|s_{t-1})

该假设称为马尔可夫假设,符合马尔可夫假设的随机过程称为马尔可夫过程(有向图-贝叶斯网络)

1.0
0.6
0.3
0.4
0.7
m1
m2
m3
m4

随机选择一个状态作为初始状态,随后依据转移规则生成后续状态,经 T T 时间后,产生状态序列 s 1 , , s T s_1,\cdots,s_T 。若时间足够长,从 m i m_i m j m_j 的转移概率为 # ( m i , m j ) / # ( m i ) \#(m_i,m_j)/\#(m_i)

隐马尔可夫模型和通信模型

隐马尔可夫模型,描述由马尔可夫链生成不可观测的状态序列,再由状态序列生成观测序列的过程。 隐含的状态序列 s 1 , s 2 , s_1,s_2,\cdots 是一个典型的马尔可夫链,这种模型称为“隐含”马尔可夫模型。

隐马尔可夫模型的两个假设:

  • 独立输出假设: HMM在每个时刻 t t 输出一个观测 o t o_t 仅与隐状态 s t s_t 相关::
    P ( o t s 1 , , s t , o 1 , , o t 1 ) = P ( o t s t ) P(o_t|s_1,\cdots,s_{t},o_1,\cdots,o_{t-1})=P(o_t|s_{t})

  • 马尔可夫假设: HMM在每个时刻 t t 的隐状态 s t s_t 仅与上一时刻隐状态 s t 1 s_{t-1} 有关:
    P ( s t s 1 , , s t 1 , o 1 , , o t 1 ) = P ( s t s t 1 ) P(s_t|s_1,\cdots,s_{t-1},o_1,\cdots,o_{t-1})=P(s_t|s_{t-1})

根据马尔可夫假设独立输出假设,状态序列和观测序列的联合概率(生成式模型)
P ( s 1 , s 2 , , o 1 , o 2 , ) = t P ( s t s t 1 ) P ( o t s t ) P(s_1,s_2,\cdots,o_1,o_2,\cdots)=\prod_tP(s_t|s_{t-1})\cdot P(o_t|s_t)

通信解码问题可用HMM解决,利用Viterbi算法找到上面概率的最大值,进而找到最可能的隐藏状态.


HMM模型表示

令隐藏状态集合 M = { m 1 , , m N } M = \{m_1,\cdots, m_N\} ,观测状态集合 V = { v 1 , , v M } V = \{v_1, \cdots, v_M\} ,隐藏状态序列 S = ( s 1 , , s T ) S = (s_1, \cdots, s_T) ,观测状态序列 O = ( o 1 , , o T ) O = (o_1, \cdots, o_T)

I. 状态转移矩阵
若时刻 t t 处于隐藏状态 m i m_i ,时刻 t + 1 t+1 处于隐藏状态为 m j m_j ,则时刻 t t 到时刻 t + 1 t+1 状态转移概率
a i j = P ( s t + 1 = m j s t = m i ) , i , j = 1 , 2 , , N a_{ij} = P(s_{t+1} = m_j | s_t = m_i), \quad i,j = 1, 2, \cdots, N

状态转移矩阵 A = [ a i j ] N × N A = [a_{ij}]_{N \times N} .

II. 观测概率矩阵
若时刻 t t 处于隐藏状态 m j m_j ,则从隐藏状态 m j m_j 到观测状态 v k v_k 生成概率
b j ( k ) = P ( o t = v k s t = m j ) , k = 1 , 2 , , M ; j = 1 , 2 , , N b_j(k) = P(o_t = v_k | s_t = m_j), \quad k = 1,2,\cdots, M; \, j = 1, 2, \cdots, N

观测概率矩阵 B = [ b j ( k ) ] N × M B = [b_j(k)]_{N\times M} .

III. 初始状态概率向量
若初始时刻 t = 1 t=1 处于状态 m i m_i 的概率
π i = P ( s 1 = m i ) , i = 1 , 2 , , N \pi_i = P(s_1 = m_i), \quad i = 1, 2, \cdots, N

初始状态概率向量 Π = ( π i ) \Pi = (\pi_i) .

综上, π \pi A A 决定状态序列, B B 决定观测序列,HMM的三元组表示为
λ = ( A , B , Π ) \lambda=(A,B,\Pi)


示例:假设有 4 4 个盒子,每盒都装有红白两种颜色的球,如下

盒子 X 1 2 3
红球数 5 4 7
白球数 5 6 3

依初始概率随机选取1个盒子,从中抽出1个球再放回,然后转移到下一个盒子,如盒子1的转移概率为
P ( X = 1 X = 1 ) = 0.5 , P ( X = 2 X = 1 ) = 0.2 , P ( X = 3 X = 1 ) = 0.3 P(X=1|X=1)=0.5,\quad P(X=2|X=1)=0.2,\quad P(X=3|X=1)=0.3

如此重复进行5次,得到球颜色的观测序列
O = { , , , , } O = \{红, 红,白,白,红\}

例中,盒子序列为隐状态序列,球颜色序列是观测序列已知,HMM三要素:
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , Π = ( 0.2 , 0.4 , 0.4 ) T A = \left[\begin{matrix} 0.5 &0.2 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right] ,\quad B = \left[\begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \end{matrix}\right] ,\quad \Pi=(0.2, 0.4, 0.4)^T


HMM概率计算

问题描述:已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) 和观测序列 O = ( o 1 , o 2 , , o T ) O = (o_1, o_2, \cdots, o_T) ,计算模型 λ \lambda 下观测序列 O O 的概率,即 P ( O λ ) P(O|\lambda)

是否可以通过枚举计算观测序列出现的概率?通过枚举状态序列 S = ( s 1 , s 2 , , s T ) S = (s_1, s_2, \cdots, s_T) ,求解 S S 与观测序列 O = ( o 1 , o 2 , , o T ) O = (o_1, o_2, \cdots, o_T) 的联合概率 P ( O , S λ ) P(O, S|\lambda) ,再求和
P ( O λ ) = S P ( O , S λ ) = S P ( O S , λ ) P ( S λ ) \begin{aligned} P(O|\lambda) & = \sum_S P(O, S|\lambda) = \sum_{S}P(O|S, \lambda)P(S|\lambda) \end{aligned}

隐藏状态序列有 N T N^T 种组合,直接计算法的复杂度为 O ( T N T ) O(TN^T) ,不适用于隐含状态较多的模型。

前向递推公式

前向算法是一种DP算法,通过定义局部状态前向概率得到递推公式,将子问题的最优解扩展到全局问题的最优解。给定模型 λ \lambda ,在时刻 t t 观测序列为 o 1 , , o t o_1, \cdots, o_t 且隐藏状态 s t = q i s_t=q_i 的概率为前向概率,定义为
α t ( i ) = P ( o 1 , , o t , s t = q i λ ) ,   P ( O λ ) = i α T ( i ) , α 1 ( i ) = π i b i ( o 1 ) \alpha_t(i) = P(o_1,\cdots,o_t,s_t=q_i|\lambda),\quad \ P(O|\lambda) = \sum_{i}\alpha_T(i),\quad \alpha_1(i) = \pi_i b_i(o_1)

齐次马尔可夫性观测独立性假设,知前向概率的递推公式为
α t + 1 ( i ) = P ( o 1 , , o t , o t + 1 , s t + 1 = q i λ ) = j P ( o 1 , , o t , o t + 1 , s t = q j , s t + 1 = q i λ ) = j P ( s t + 1 = q i , o t + 1 o 1 , , o t , s t = q j , λ ) P ( o 1 , , o t , s t = q j λ ) = j P ( s t + 1 = q i , o t + 1 s t = q j , λ ) α t ( j ) = j P ( o t + 1 s t = q j , s t + 1 = q i , λ ) P ( s t + 1 = q i s t = q j , λ ) α t ( j ) = [ j α t ( j ) a j i ] b i ( o t + 1 ) \begin{aligned} \alpha_{t+1}(i) &=P(o_1,\cdots,o_t,o_{t+1},s_{t+1}=q_i|\lambda)\\[1ex] &=\sum_jP(o_1,\cdots,o_t,o_{t+1},s_t=q_j,s_{t+1}=q_i|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|o_1,\cdots,o_t,s_t=q_j,\lambda)P(o_1,\cdots,o_t,s_t=q_j|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|s_t=q_j,\lambda)\alpha_t(j)\\ &=\sum_jP(o_{t+1}|s_t=q_j,s_{t+1}=q_i,\lambda)P(s_{t+1}=q_i|s_t=q_j,\lambda)\alpha_t(j)\\ &=\left[\sum_{j} \alpha_t(j) a_{ji}\right] b_i(o_{t+1}) \end{aligned}

基于状态序列的路径结构递推计算 P ( O λ ) P(O|\lambda) ,通过保存子问题的解以避免重复计算,达到计算加速的目的。

矩阵形式为 α 1 = π B o 1 ,   α t + 1 = ( α t T A ) B o t + 1 \boldsymbol\alpha_1=\boldsymbol\pi\odot\boldsymbol B_{o_1},\ \boldsymbol\alpha_{t+1}=(\boldsymbol\alpha_t^TA)\odot\boldsymbol B_{o_{t+1}} ,最后迭代得到 α T ( i ) \alpha_T(i) ,因此
P ( O λ ) = i α T ( i ) P(O|\lambda)=\sum_{i}\alpha_T(i)

若模型 λ \lambda N N 个隐藏状态,观测序列 O O 的长度为 T T ,则 P ( O λ ) P(O|\lambda) 的时间复杂度为 O ( N 2 T ) O(N^2T)


Python实现

import numpy as np


def forward_HMM(O, PI, A, B):
    """
    已知模型,求解状态序列概率

    :param O: 1D, 观测序列(元素为整数)
    :param PI: 1D, 初始概率向量
    :param A: 2D, 状态转移矩阵
    :param B: 2D, 观测生成矩阵
    :return: float, O的概率
    """
    PI = np.asarray(PI).ravel()
    A = np.asarray(A)
    B = np.asarray(B)

    # 求解第1步的前向概率
    alphas = B[:, O[0]] * PI

    # 求解2至T步的前向概率
    for index in O[1:]:
        alphas = np.dot(alphas, A) * B[:, index]

    # 累计最后所有隐藏状态的前向概率
    return alphas.sum()

if __name__ == '__main__':
    # 初始概率向量
    PI = [0.2, 0.4, 0.4]
    # 状态转移矩阵N*N, N个隐含状态
    A = [[0.5, 0.2, 0.3], [0.3, 0.5, 0.2], [0.2, 0.3, 0.5]]
    # 观测概率矩阵N*M, N个隐含状态, M个观测状态
    B = [[0.5, 0.5], [0.4, 0.6], [0.7, 0.3]]
    # 观测序列
    O = [0, 1, 0]

    print(forward_HMM(O, PI, A, B))

后向递推公式

给定模型 λ \lambda ,在时刻 t t 隐藏态为 q i q_i 且时刻 t + 1 t+1 之后观测序列为 o t + 1 , , o T o_{t+1}, \cdots, o_T 的概率为后向概率,即
β t ( i ) = P ( o t + 1 , o t + 2 , , o T s t = q i , λ ) , P ( O λ ) = i π i b i ( o 1 ) β 1 ( i ) , β T ( i ) = 1 \beta_t(i) = P(o_{t+1},o_{t+2},\cdots,o_T|s_t = q_i, \lambda),\quad P(O|\lambda) = \sum_{i}\pi_i b_i(o_1) \beta_1(i),\quad \beta_{T}(i) = 1

齐次马尔可夫性观测独立性假设,知后向概率的递推公式
β t ( i ) = j P ( o t + 1 , , o T , s t + 1 = q j s t = q i , λ ) = j P ( o t + 1 , , o T s t = q i , s t + 1 = q j , λ ) P ( s t + 1 = q j s t = q i , λ ) = j a i j P ( o t + 1 , , o T s t + 1 = q j , λ ) = j a i j P ( o t + 1 o t + 2 , , o T , s t + 1 = q j , λ ) P ( o t + 2 , , o T s t + 1 = q j , λ ) = j a i j P ( o t + 1 s t + 1 = q j , λ ) P ( o t + 2 , , o T s t + 1 = q j , λ ) = j a i j b j ( o t + 1 ) β t + 1 ( j ) \begin{aligned}\beta_t(i) & = \sum_{j}P(o_{t+1},\cdots,o_T,s_{t+1}=q_j|s_t = q_i, \lambda) \\ & = \sum_{j}P(o_{t+1},\cdots,o_T|s_t = q_i,s_{t+1}=q_j, \lambda)\cdot P(s_{t+1}=q_j|s_t =q_i, \lambda) \\ & = \sum_{j}a_{ij}\cdot P(o_{t+1},\cdots,o_T| s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|o_{t+2},\cdots,o_T,s_{t+1}=q_j,\lambda)\cdot P(o_{t+2}, \cdots, o_T|s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|s_{t+1}=q_j,\lambda)\cdot P(o_{t+2},\cdots, o_T|s_{t+1}=q_j, \lambda) \\ & = \sum_{j}a_{ij}\cdot b_j(o_{t+1})\cdot \beta_{t+1}(j) \end{aligned}


前后向算法之间的关系

P ( O λ ) = i P ( o 1 , , o t , s t = q i , o t + 1 , , o T , λ ) = i P ( o t + 1 , , o T o 1 , , o t , s t = q t , λ ) P ( o 1 , , o t , s t = q t λ ) = i P ( o t + 1 , , o T s t = q t , λ ) P ( o 1 , , o t , s t = q t λ ) = i α t ( i ) β t ( i ) = i P ( s t = q i , O λ ) \begin{aligned} P(O|\lambda) & = \sum_{i}P(o_1, \cdots, o_t, s_t=q_i, o_{t+1}, \cdots, o_T, |\lambda)\\ & = \sum_{i}P(o_{t+1}, \cdots, o_T | o_1, \cdots, o_t , s_t= q_t, \lambda)\cdot P(o_1, \cdots,o_t,s_t=q_t |\lambda) \\ & = \sum_{i}P(o_{t+1}, \cdots, o_T|s_t=q_t, \lambda)\cdot P(o_1, \cdots, o_t, s_t=q_t | \lambda) \\ & = \sum_{i}\alpha_t(i)\beta_t(i)=\sum_iP(s_t=q_i, O|\lambda) \end{aligned}

t = T 1 t=T-1 t = 1 t=1 时,上式分别表示前向和后向概率计算公式.

一些概率计算公式

给定模型 λ \lambda 和观测序列 O O ,时刻 t t 处于状态 q i q_i 的概率,记作
γ t ( i ) = P ( s t = q i O , λ ) = P ( s t = q i , O λ ) P ( O λ ) = α t ( i ) β t ( i ) j α t ( j ) β t ( j ) \gamma_t(i) = P(s_t =q_i | O, \lambda) = \frac{P(s_t=q_i,O | \lambda)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\displaystyle\sum_{j}\alpha_t(j)\beta_t(j)}

给定模型 λ \lambda 和观测序列 O O ,时刻 t t 处于状态 q i q_i 且时刻 t + 1 t+1 处于状态 q j q_j 的概率 ,记作
ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j O , λ ) = P ( s t = q i , s t + 1 = q j , O λ ) i j P ( s t = q i , s t + 1 = q j , O λ ) \xi_t(i, j) = P(s_t=q_i, s_{t+1}=q_j|O, \lambda) = \frac{P(s_t=q_i, s_{t+1}=q_j,O| \lambda)}{\displaystyle\sum_i\sum_jP(s_t=q_i, s_{t+1}=q_j, O|\lambda)}

其中, P ( s t = q i , s t + 1 = q j , O λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(s_t=q_i, s_{t+1}=q_j, O|\lambda)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)


HMM模型学习

问题描述:给定观测序列 O = ( o 1 , o 2 , , o T ) O = (o_1, o_2, \cdots, o_T) ,求最可能的HMM的 λ = ( A , B , Π ) \lambda=(A,B,\Pi)

监督学习方法

若有足够多的标记数据,即已知隐含状态 m j m_j 出现的次数 # ( m j ) \#(m_j) 、生成观测状态 v k v_k 的次数 # ( v k , m j ) \#(v_k,m_j) ,则参数估计
a i j # ( m i , m j ) # ( m i ) , b j ( k ) # ( v k , m j ) # ( m j ) , π i # ( m i ) # ( m k ) a_{ij}\approx\frac{\#(m_i,m_j)}{\#(m_i)},\quad b_j(k)\approx\frac{\#(v_k,m_j)}{\#(m_j)},\quad \pi_i\approx\frac{\#(m_i)}{\displaystyle\sum \#(m_k)}

很多应用不可能做到这件事情,比如语音识别的声学模型训练,人无法确定产生某个语音的状态序列。

期望最大化算法

HMM的概率模型
P ( O λ ) = S P ( O S , λ ) P ( S λ ) P(O|\lambda)=\sum_SP(O|S, \lambda)P(S|\lambda)

EM算法中的Q函数
Q ( λ , λ ) = S P ( S O , λ ) ln P ( O , S λ ) S P ( O , S λ ) ln P ( O , S λ ) Q(\lambda, \lambda')=\sum_SP(S|O,\lambda')\ln P(O,S|\lambda)\propto\sum_S P(O,S|\lambda')\ln P(O,S|\lambda)
根据状态序列和观测序列的联合分布(下标 i j i_j 表示任意隐状态编号)
P ( O , S λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) a i T 1 i T b i T ( o T ) P(O,S|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)\cdots a_{i_{T-1}i_T}b_{i_T}(o_T)

Q ( λ , λ ) = S P ( O , S λ ) ln π i 1 + S P ( O , S λ ) ln t = 1 T 1 a i t i t + 1 + S P ( O , S λ ) ln t = 1 T b i t ( o t ) Q(\lambda, \lambda')=\sum_SP(O,S|\lambda')\ln\pi_{i_1}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{i_t}(o_t) \\
式中
S P ( O , S λ ) ln π i 1 = i P ( O , s 1 = q i λ ) ln π i , i π i = 1 S P ( O , S λ ) ln t = 1 T 1 a i t i t + 1 = i j P ( O , s t = q i , s t + 1 = q j λ ) t = 1 T 1 ln a i j S P ( O , S λ ) ln t = 1 T b i t ( o t ) = i P ( O , i t = i λ ) ln i = 1 T b i ( o t ) \begin{aligned} & \sum_SP(O,S|\lambda')\ln \pi_{i_1}=\sum_iP(O,s_1=q_i|\lambda')\ln\pi_{i},\quad\sum_i\pi_i=1\\ &\sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}=\sum_i\sum_jP(O,s_t=q_i,s_{t+1}=q_j|\lambda')\sum_{t=1}^{T-1}\ln a_{ij}\\ & \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{it}(o_t)=\sum_iP(O,i_t=i|\lambda')\ln\sum_{i=1}^Tb_i(o_t) \end{aligned}
π i \pi_i a i j a_{ij} b j ( k ) b_j(k) 的偏导为0得(根据上节概率计算公式)
π i = P ( O , s 1 = q i λ ) P ( O λ ) = γ 1 ( i ) , a i j = i = 1 T 1 ξ t ( i , j ) i = 1 T 1 γ t ( i ) , b j ( k ) = t = 1 , o t = v k T γ t ( j ) t = 1 T γ t ( j ) \pi_i = \frac{P(O, s_1=q_i|\lambda')}{P(O|\lambda')}=\gamma_1(i),\quad a_{ij}=\frac{\sum_{i=1}^{T-1}\xi_t(i,j)}{\sum_{i=1}^{T-1}\gamma_t(i)},\quad b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t=1}^T\gamma_t(j)}


HMM预测/解码

给定模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) 和观测序列 O = ( o 1 , o 2 , , o T ) O = (o_1, o_2, \cdots, o_T) ,求最可能的隐藏状态序列 S S ,即 P ( S O , λ ) P(S|O, \lambda) .

贪心近似算法

给定 λ \lambda 和观测序列 O O ,时刻 t t 处于状态 q i q_i 的概率
γ t ( i ) = P ( s t = q i O , λ ) = α t ( i ) β t ( i ) j α t ( j ) β t ( j ) \gamma_t(i)=P(s_t=q_i | O, \lambda) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j}\alpha_t(j)\beta_t(j)}
每个时刻t选择最可能出现的状态 s t s_t^* ,从而得到状态序列 S S^* ,即
S = ( s 1 , s 2 , ) , s t = q k = arg max k γ t ( k ) S^*=(s_1^*,s_2^*,\cdots),\quad s_t^*=q_k = \arg\max_k\gamma_t(k)


维特比算法

DP思想:最优路径中的部分路径也一定是最优的。设观测序列 o 1 , , o t o_1,\cdots,o_t 下状态 s t = q i s_t=q_i 的所有路径中概率最大值为
δ t ( i ) = max i P ( s t = q i , s t 1 , , s 1 o t , , o 1 λ ) \delta_t(i) = \max_{i}P(s_t=q_i, s_{t-1}, \cdots, s_1, o_t, \cdots, o_1|\lambda)

递推公式
δ t + 1 ( i ) = max j δ t ( j ) a j i b i ( o t + 1 ) , δ 1 ( i ) = π i b i ( o i ) \delta_{t+1}(i)=\max_j\delta_t(j)a_{ji}b_i(o_{t+1}),\quad \delta_1(i) = \pi_ib_i(o_i)

定义时刻 t + 1 t+1 状态为 q i q_i 的最大概率路径的第 t t 个节点
i t = ψ t + 1 ( i ) = arg max j δ t ( j ) a j i , i T = arg max i δ T ( i ) i_{t} = \psi_{t+1}(i) = \arg\max_{j}\delta_{t}(j)a_{ji},\quad i_T=\arg\max_{i}\delta_T(i)

P ( S O , λ ) = max i δ T ( i ) P(S|O,\lambda)=\max_{i}\delta_T(i) .

如图所示 δ 3 ( i 1 ) = max { δ 2 ( i 1 ) a 11 b 1 ( o 3 ) ,    δ 2 ( i 2 ) a 21 b 1 ( o 3 ) ,    δ 2 ( i 3 ) a 31 b 1 ( o 3 ) } \delta_3(i_1)=\max\{\delta_2(i_1)a_{11}b_{1}(o_3), \,\,\delta_2(i_2)a_{21}b_1(o_3),\,\, \delta_2(i_3)a_{31}b_1(o_3)\} .

示例: 基于第4解模型 λ = ( A , B , Π ) \lambda = (A, B, \Pi) ,已知观测序列 O = ( , , ) O=(红, 白, 红) ,求最优状态序列。

I. 初始化
时刻 t = 1 t=1 ,每一个隐藏状态 q i q_i 观测到红色的概率
δ 1 ( 1 ) = 0.2 0.5 = 0.1 , δ 1 ( 2 ) = 0.4 0.4 = 0.16 , δ 1 ( 3 ) = 0.4 0.7 = 0.28 , ψ 1 ( i ) = 0 \delta_1(1)=0.2*0.5=0.1, \quad \delta_1(2)=0.4*0.4=0.16, \quad \delta_1(3)=0.4*0.7=0.28, \quad \psi_1(i)=0


II. 迭代计算
时刻 t = 2 t=2 状态为 q 1 q_1 观测为白的最大概率
δ 2 ( 1 ) = max 1 j 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max { 0.1 0.5 , 0.16 0.3 , 0.28 0.2 } 0.5 = 0.028 , ψ 2 ( 1 ) = 3 \delta_2(1)=\max_{1\leq j \leq 3}[\delta_1(j)a_{j1}]b_1(o_2) = \max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5 = 0.028, \quad \psi_2(1)=3
同理 δ 2 ( 2 ) = 0.0504 , ψ 2 ( 2 ) = 3 ; δ 2 ( 3 ) = 0.042 , ψ 2 ( 3 ) = 3 \delta_2(2)=0.0504, \psi_2(2)=3; \, \delta_2(3)=0.042, \psi_2(3)=3 .

时刻 t = 3 t=3 状态为 q j q_j 观测为红的最大概率
δ 3 ( 1 ) = 0.00756 ,   ψ 3 ( 1 ) = 2 ,   δ 3 ( 2 ) = 0.01008 ,   ψ 3 ( 2 ) = 2 ,   δ 3 ( 3 ) = 0.0147 ,   ψ 3 ( 3 ) = 3. \delta_3(1)=0.00756,\ \psi_3(1)=2,\ \delta_3(2)=0.01008,\ \psi_3(2)=2,\ \delta_3(3)=0.0147,\ \psi_3(3)=3.


III. 最优概率路径
P = max 1 i 3 δ 3 ( i ) = 0.0147 P^* = \max_{1\leq i \leq 3} \delta_3(i)=0.0147
因此 i 3 = 3 i_3 = 3 i 2 = ψ 3 ( i 3 ) = 3 i_2 = \psi_3(i_3)=3 i 1 = ψ 2 ( i 2 ) = 3 i_1 = \psi_2(i_2)=3 ,最优状态序列 I = ( i 1 , i 2 , i 3 ) = ( 3 , 3 , 3 ) I=(i_1, i_2, i_3)=(3,3,3) .

隐藏状态序列 s = ( s 1 , , s n ) \boldsymbol s=(s_1, \cdots, s_n) ,观测序列 o = ( o 1 , , o n ) \boldsymbol o=(o_1, \cdots, o_n) .

HMM局限

HMM建模联合概率分布 λ = P ( S , O ) \lambda=P(S, O) ,解码/预测问题是找到状态序列 s \boldsymbol s ,使得 P ( s o , λ ) P(\boldsymbol s|\boldsymbol o, λ) 最大。

HMM中, s i s_i 仅依赖 s i 1 s_{i-1} o i o_i 依赖 s i s_i ,若观测序列通过很多特征刻画,比如NER任务中标注 s i s_i 不仅依赖 o i o_i ,还依赖前后标注 o j ( j i ) o_j(j\neq i) ,如周围观测的大小写、词性等特征,则HMM模型不能处理该类任务。

猜你喜欢

转载自blog.csdn.net/sinat_34072381/article/details/107272200