隐马尔可夫模型(Hidden Markov Model, HMM)

1. 基本概念

1.1 HMM定义

隐马尔可夫模型(简称HMM),描述由隐藏的马尔可夫链随机生成不可观测的状态序列,再由各个状态生成观测序列的过程。隐藏的马尔可夫链生成的不可观测的状态随机序列称为状态序列。每一个状态可生成一个观测,各个状态产生的随机序列称为观测序列

如输入法通过输入的拼音串(观测序列),判断实际想要输入的句子(状态序列)。

1.2. HMM三要素

隐藏状态集合 Q = { q 1 ,   , q N } Q = \{q_1,\cdots, q_N\} ,观测集合 V = { v 1 ,   , v M } V = \{v_1, \cdots, v_M\} ,状态序列 I = ( i 1 ,   , i T ) I = (i_1, \cdots, i_T) ,观测序列 O = ( o 1 ,   , o T ) O = (o_1, \cdots, o_T)

(1)状态转移矩阵
若时刻 t t 处于隐藏状态 q i q_i ,在时刻 t + 1 t+1 处于隐藏状态为 q j q_j ,则从时刻 t t 到时刻 t + 1 t+1 的状态转移概率
a i j = P ( i t + 1 = q j i t = q i ) , i = 1 , 2 ,   , N ;   j = 1 , 2 ,   , N a_{ij} = P(i_{t+1} = q_j | i_t = q_i), \quad i = 1, 2, \cdots, N; \, j = 1, 2, \cdots, N

则状态转移矩阵 A = [ a i j ] N × N A = [a_{ij}]_{N \times N}

(2)观测概率矩阵
若时刻 t t 处于隐藏状态是 q j q_j ,而观测到状态 v k v_k ,则从隐藏状态 q j q_j 生成观测状态 v k v_k 的概率
b j ( k ) = P ( o t = v k i t = q j ) , k = 1 , 2 ,   , M ;   j = 1 , 2 ,   , N b_j(k) = P(o_t = v_k | i_t = q_j), \quad k = 1,2,\cdots, M; \, j = 1, 2, \cdots, N

则观测概率矩阵 B = [ b j ( k ) ] N × M B = [b_j(k)]_{N\times M} ,其中 N N 个隐藏状态, M M 个观测状态。

(3)初始状态概率向量
若初始时刻 t = 1 t=1 处于状态 q i q_i 的概率
π i = P ( i 1 = q i ) , i = 1 , 2 ,   , N \pi_i = P(i_1 = q_i), \quad i = 1, 2, \cdots, N

则所有初始状态的概率组成向量,即 Π = ( π i ) \Pi = (\pi_i)

HMM由初始状态概率向量 π \pi 、状态转移概率矩阵 A A 和观测概率矩阵 B B 决定。 Π \Pi A A 决定状态序列, B B 决定观测序列,三元组表示为 λ = ( A , B , Π ) \lambda=(A,B,\Pi)

1.3 HMM的两个假设

(1)齐次马尔可夫假设,即隐藏的马尔可夫链在任意时刻的隐藏状态只依赖于它的前一个隐藏状态。
P ( i t i t 1 , o t 1 ,   , i 1 , o 1 ) = P ( i t i t 1 ) , t = 1 , 2 ,   , T P(i_t | i_{t-1}, o_{t-1}, \cdots, i_1, o_1) = P(i_t|i_{t-1}), \quad t = 1,2, \cdots, T

(2)观测独立性假设,即任意时刻的观测状态只依赖于该时刻马尔可夫链的隐藏状态。
P ( o t i T , o T , i T 1 , o T 1 ,   , i t + 1 , o t + 1 , i t , i t 1 , o t 1 ,   , i 1 , o 1 ) = P ( o t i t ) , t = 1 , 2 ,   , T P(o_t | i_T, o_T, i_{T-1}, o_{T-1}, \cdots, i_{t+1}, o_{t+1}, i_t,i_{t-1}, o_{t-1}, \cdots, i_1, o_1) = P(o_t| i_t), \quad t = 1,2, \cdots, T

1.4 HMM实例

假设有 4 4 个盒子,每盒都装有红白两种颜色的球,如下

盒子 X 1 2 3 4
红球数 5 3 6 8
白球数 5 7 4 2

开始时,从4个盒子等概率 随机选取一个盒子,从中抽出一个球再放回,然后从当前盒子随机转移到下一个盒子。规则是:
P ( X = 2 X = 1 ) = 1 P ( X = 1 X = 2 ) = 0.4 P ( X = 3 X = 2 ) = 0.6 P ( X = 2 X = 3 ) = 0.4 P ( X = 4 X = 3 ) = 0.6 P ( X = 3 X = 4 ) = 0.5 P ( X = 4 X = 4 ) = 0.5 \begin{aligned} & P(X=2|X=1)=1\\ & P(X=1|X=2) = 0.4,P(X=3|X=2) = 0.6\\ &P(X=2|X=3) = 0.4,P(X=4|X=3) = 0.6\\ &P(X=3|X=4) = 0.5,P(X=4|X=4) = 0.5 \\ \end{aligned}

如此重复进行5次,得到球颜色的观测序列
O = { , , , , } O = \{红, 红,白,白,红\}

例子中,盒子的序列是状态序列(未知),球的颜色序列是观测序列(已知)。根据以上条件,HMM三要素
(1)状态转移矩阵
A = [ 0 1 0 0 0.4 0 0.6 0 0 0.4 0 0.6 0 0 0.5 0.5 ] A = \left[ \begin{matrix} 0 &1 &0 &0 \\ 0.4 &0 &0.6 &0 \\ 0 &0.4 &0 &0.6\\ 0 &0 &0.5 &0.5 \end{matrix} \right]

(2)观测状态生成概率矩阵
B = [ 0.5 0.5 0.3 0.7 0.6 0.4 0.8 0.2 ] B = \left[ \begin{matrix} 0.5 &0.5 \\ 0.3 &0.7 \\ 0.6 &0.4 \\ 0.8 &0.2 \end{matrix} \right]

(3)隐藏状态的初始概率分布
Π = ( 0.25 , 0.25 , 0.25 , 0.25 ) T \Pi = (0.25, 0.25, 0.25, 0.25)^T

1.5 HMM的3个基本问题

(1)概率计算问题
已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) 和观测序列 O = ( o 1 , o 2 ,   , o T ) O = (o_1, o_2, \cdots, o_T) ,求模型 λ \lambda 下观测序列 O O 出现的概率 P ( O λ ) P(O|\lambda)

(2)学习问题
已知观测序列 O = ( o 1 , o 2 ,   , o T ) O = (o_1, o_2, \cdots, o_T) ,求使得观测序列概率 P ( O λ ) P(O|\lambda) 最大的模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) ,可用极大似然估计求解。

(3)预测/解码问题
已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) 和观测序列 O = ( o 1 , o 2 ,   , o T ) O = (o_1, o_2, \cdots, o_T) ,求给定模型 λ \lambda 和观测序列 O O 下最可能的状态序列 I I ,即 P ( I O , λ ) P(I|O, \lambda)

2. 概率计算方法

已知模型 λ = ( A , B , Π ) \lambda=(A,B,\Pi) 和观测序列 O = ( o 1 , o 2 ,   , o T ) O = (o_1, o_2, \cdots, o_T) ,计算模型 λ \lambda 下观测序列 O O 出现的概率 P ( O λ ) P(O|\lambda)

2.1 直接计算法

由于已知所有隐藏状态之间的转移概率和所有从隐藏状态到观测状态的生成概率,可直接按概率公式计算。

列举所有可能的长度为 T T 的状态序列 I = ( i 1 . i 2 ,   , i T ) I = (i_1. i_2, \cdots, i_T) ,分别求出各状态序列 I I 与观测序列 O = ( o 1 , o 2 ,   , o T ) O = (o_1, o_2, \cdots, o_T) 的联合概率 P ( O , I λ ) P(O, I|\lambda) ,然后求和即可得到边缘分布 P ( O λ ) P(O|\lambda)
P ( O λ ) = I P ( O , I λ ) = I P ( O I , λ ) P ( I λ ) = i 1 , i 2 ,   , i T [ b i 1 ( o 1 ) b i 2 ( o 2 ) b i T ( o T ) ] [ π i 1 a i 1 i 2 a i 2 i 3 a i T 1 i T ] \begin{aligned} P(O|\lambda) & = \sum_I P(O, I|\lambda) = \sum_{I}P(O|I, \lambda)P(I|\lambda) \\ & = \sum_{i_1,i_2,\cdots,i_T} [b_{i_1}(o_1)b_{i_2}(o_2)\cdots b_{i_T}(o_T)][\pi_{i_1}a_{i_1i_2}a_{i_2i_3}\cdots a_{i_{T-1}i_T}] \end{aligned}

N N 个状态组成长度为 T T 的序列有 N T N^T 种组合,直接计算法的复杂度为 O ( T N T ) O(TN^T) 阶,不适用于隐藏状态较多的模型。

2.2 前向算法

前向算法属于动态规划算法,通过定义局部状态“前向概率”得到递推公式,使得子问题的最优解扩展到全局问题的最优解。
给定模型 λ \lambda ,在时刻 t t 时观测状态序列为 o 1 , o 2 ,   , o t o_1, o_2, \cdots, o_t 且隐藏状态为 q i q_i 的概率为前向概率,即
α t ( i ) = P ( o 1 , o 2 ,   , o t , i t = q i λ ) \alpha_t(i) = P(o_1,o_2,\cdots,o_t,i_t=q_i|\lambda)

如何获取动态规划的递推公式?
假设已知时刻 t t 的各隐藏状态 q j q_j 的前向概率 α t ( j ) \alpha_t(j) ,则:

α t ( j ) \alpha_t(j) a j i a_{ji} 表示时刻 t t 观测到状态序列 o 1 , o 2 ,   , o t o_1, o_2, \cdots, o_t 且隐藏状态为 q j q_j ,而在时刻 t + 1 t+1 隐藏状态转移到 q i q_i 的概率;

j = 1 N α t ( j ) \sum_{j=1}^N\alpha_t(j) a j i a_{ji} 表示时刻 t t 观测到 o 1 , o 2 ,   , o t o_1,o_2, \cdots,o_t 并在时刻 t + 1 t+1 隐藏状态为 q i q_i 的联合概率;

因此前向概率 α t + 1 ( i ) \alpha_{t+1}(i) ,即时刻 t + 1 t+1 隐藏状态为 q i q_i 且观测到状态序列 o 1 , o 2 ,   , o t + 1 o_1, o_2, \cdots, o_{t+1} 的概率为:
α t + 1 ( i ) = [ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i) = \left[\sum_{j=1}^N \alpha_t(j) a_{ji}\right] b_i(o_{t+1})

最后迭代得到 α T ( i ) \alpha_T(i) ,累加时刻 T T 所有可能的隐藏状态 q i q_i 的前向概率,即 i = 1 N α T ( i ) \sum_{i=1}^N \alpha_T(i) ,则求得 P ( O λ ) P(O|\lambda)

图1 前向概率递推公式

证明: α t ( j ) = P ( o 1 , o 2 ,   , o t , i t = q j λ ) \alpha_t(j) = P(o_1,o_2,\cdots,o_t,i_t=q_j|\lambda) a j i = P ( i t + 1 = q i i t = q j ) a_{ji}= P(i_{t+1} = q_i|i_t = q_j) ,且由齐次马尔可夫假设
a j i = P ( i t + 1 = q i i t = q j ) = P ( i t + 1 = q i o 1 , o 2 ,   , o t , i t = q j ) a_{ji}= P(i_{t+1} = q_i|i_t = q_j) = P(i_{t+1} = q_i|o_1, o_2, \cdots, o_t, i_t = q_j)

因此
j = 1 N α t ( j ) a j i = j = 1 N P ( o 1 , o 2 ,   , o t , i t = q j , i t + 1 = q i λ ) = P ( o 1 , o 2 ,   , o t , i t + 1 = q i λ ) \sum_{j=1}^N \alpha_t(j)a_{ji} =\sum_{j=1}^NP(o_1, o_2, \cdots, o_t, i_t = q_j, i_{t+1} = q_i|\lambda) = P(o_1, o_2, \cdots, o_t, i_{t+1} = q_i|\lambda)

由于 b i ( o t + 1 ) = P ( o t + 1 i t + 1 = q i ) b_i(o_{t+1}) = P(o_{t+1}|i_{t+1}=q_i) ,由观测独立性假设 b i ( o t + 1 ) = P ( o t + 1 o 1 , o 2 ,   , o t , i t + 1 = q i ) b_i(o_{t+1})=P(o_{t+1}|o_1, o_2, \cdots, o_t,i_{t+1}=q_i) ,故
α t + 1 ( j ) = P ( o 1   , o t , o t + 1 , i t + 1 = q j λ ) = [ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(j) = P(o_1\cdots,o_t, o_{t+1},i_{t+1}=q_j|\lambda) = \left[\sum_{j=1}^N \alpha_t(j)a_{ji}\right]b_i(o_{t+1})

前向算法时间复杂度
前向算法减少计算量的原因在于每一次计算直接引用前一时刻的计算结果(动态规划缓冲),避免重复计算。
若模型 λ \lambda N N 个隐藏状态,观测序列 O O 的长度为 T T ,则 P ( O λ ) P(O|\lambda) 的计算量是 O ( N 2 T ) O(N^2T)

前向算法求解步骤
(1)初始时刻,隐藏状态 i 1 = q i i_1=q_i 和观测状态 o 1 o_1 的联合概率
α 1 ( i ) = π i b i ( o 1 ) \alpha_1(i) = \pi_i b_i(o_1)
(2)递归计算时刻 t + 1 t+1 部分观测序列为 o 1 , o 2 ,   , o t o_1, o_2, \cdots, o_t 且隐藏状态为 q i q_i 的前向概率
α t + 1 ( i ) = [ j = 1 N α t ( j ) a j i ] b i ( o t + 1 ) \alpha_{t+1}(i) = \left[\sum_{j=1}^N \alpha_t(j) a_{ji}\right] b_i(o_{t+1})

(3)终止
P ( O λ ) = i = 1 N α T ( i ) , α T ( i ) = P ( o 1 , o 2 ,   , o T , i T = q i λ ) P(O|\lambda) = \sum_{i=1}^N \alpha_T(i), \quad \alpha_T(i) = P(o_1, o_2, \cdots, o_T, i_T=q_i | \lambda)

前向算法求解实例
考虑HMM模型 λ = ( A , B , Π ) \lambda = (A, B, \Pi) ,状态集合 Q = { 1 , 2 , 3 } Q = \{1, 2, 3\} ,观测集合 V = { , } V = \{红, 白\} ,且三要素
A = [ 0.5 0.2 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , Π = ( 0.2 , 0.4 , 0.4 ) T A = \left[\begin{matrix} 0.5 &0.2 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right], \quad B = \left[ \begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \\ \end{matrix}\right], \quad \Pi = (0.2, 0.4, 0.4)^T

O = ( , , ) O = (红, 白, 红) ,计算 P ( O λ ) P(O|\lambda)

解:
(1)初值
α 1 ( 1 ) = π 1 b 1 ( o 1 ) = 0.10 , α 1 ( 2 ) = π 2 b 2 ( o 1 ) = 0.16 , α 1 ( 3 ) = π 3 b 3 ( o 1 ) = 0.28 \alpha_1(1) = \pi_1b_1(o_1) = 0.10, \quad \alpha_1(2) = \pi_2b_2(o_1) = 0.16, \quad \alpha_1(3) = \pi_3b_3(o_1) = 0.28

(2)递推计算
α 2 ( 1 ) = [ j = 1 3 α 1 ( j ) a j 1 ] b 1 ( o 2 ) = 0.154 0.5 = 0.077   α 2 ( 2 ) = [ j = 1 3 α 1 ( j ) a j 2 ] b 2 ( o 2 ) = 0.184 0.6 = 0.1104   α 2 ( 3 ) = [ j = 1 3 α 1 ( j ) a j 3 ] b 3 ( o 2 ) = 0.202 0.3 = 0.0606   α 3 ( 1 ) = [ j = 1 3 α 2 ( j ) a j 1 ] b 1 ( o 3 ) = 0.04187   α 3 ( 2 ) = [ j = 1 3 α 2 ( j ) a j 2 ] b 2 ( o 3 ) = 0.03551   α 3 ( 3 ) = [ j = 1 3 α 2 ( j ) a j 3 ] b 1 ( o 3 ) = 0.05284 \begin{aligned} & \alpha_2(1) = \left[\sum_{j=1}^3\alpha_1(j)a_{j1}\right]b_1(o_2) = 0.154*0.5 = 0.077\\ \,\\ & \alpha_2(2) = \left[\sum_{j=1}^3\alpha_1(j)a_{j2}\right]b_2(o_2) = 0.184*0.6 = 0.1104\\ \,\\ & \alpha_2(3) = \left[\sum_{j=1}^3\alpha_1(j)a_{j3}\right]b_3(o_2) = 0.202*0.3 = 0.0606\\ \,\\ & \alpha_3(1) = \left[\sum_{j=1}^3\alpha_2(j)a_{j1}\right]b_1(o_3) = 0.04187\\ \,\\ & \alpha_3(2) = \left[\sum_{j=1}^3\alpha_2(j)a_{j2}\right]b_2(o_3) = 0.03551\\ \,\\ & \alpha_3(3) = \left[\sum_{j=1}^3\alpha_2(j)a_{j3}\right]b_1(o_3) = 0.05284\\ \end{aligned}

(3)终止
P ( O λ ) = i = 1 3 α 3 ( i ) = 0.13022 P(O|\lambda) = \sum_{i=1}^3 \alpha_3(i)= 0.13022

2.3 后向算法

给定HMM模型 λ \lambda ,在时刻 t t 状隐藏态为 q i q_i 的条件下, t + 1 t+1 之后的观测序列为 o t + 1 , o t + 2 ,   , o T o_{t+1}, o_{t+2}, \cdots, o_T 的概率为后向概率,即
β t ( i ) = P ( o t + 1 , o t + 2 ,   , o T i t = q i , λ ) \beta_t(i) = P(o_{t+1},o_{t+2},\cdots,o_T|i_t = q_i, \lambda)

如何获取动态规划的递推公式?

β t ( i ) = j = 1 N b j ( o t + 1 ) β t + 1 ( j ) a i j \beta_t(i) = \sum_{j=1}^N b_j(o_{t+1})\cdot \beta_{t+1}(j)\cdot a_{ij}

图2 后向概率递推公式

证明:
β t ( i ) = P ( o t + 1 , o t + 2 ,   , o T i t = q i , λ ) = j = 1 N P ( o t + 1 , o t + 2 ,   , o T , i t + 1 = q j i t = q i , λ ) = j = 1 N P ( o t + 1 , o t + 2 ,   , o T i t = q i , i t + 1 = q j , λ ) P ( i t + 1 = q j i t = q i , λ ) = j = 1 N P ( o t + 1 , o t + 2 ,   , o T i t + 1 = q j , λ ) a i j = j = 1 N P ( o t + 1 o t + 2 , o t + 3 ,   , o T , i t + 1 = q j , λ ) P ( o t + 2 , o t + 3 ,   , o T i t + 1 = q j , λ ) a i j = j = 1 N P ( o t + 1 i t + 1 = q j , λ ) P ( o t + 2 , o t + 3 ,   , o T i t + 1 = q j , λ ) a i j = j = 1 N b j ( o t + 1 ) β t + 1 ( j ) a i j \begin{aligned}\beta_t(i) &= P(o_{t+1},o_{t+2},\cdots,o_T|i_t = q_i, \lambda) \\ & = \sum_{j=1}^N P(o_{t+1},o_{t+2},\cdots,o_T, i_{t+1}=q_j|i_t = q_i, \lambda) \\ & = \sum_{j=1}^N P(o_{t+1},o_{t+2},\cdots,o_T|i_t = q_i, i_{t+1}=q_j, \lambda) \cdot P(i_{t+1}=q_j | i_t =q_i, \lambda) \\ & = \sum_{j=1}^N P(o_{t+1},o_{t+2},\cdots,o_T| i_{t+1}=q_j, \lambda)\cdot a_{ij} \\ & = \sum_{j=1}^N P(o_{t+1}|o_{t+2}, o_{t+3}, \cdots, o_T, i_{t+1}=q_j, \lambda)\cdot P(o_{t+2}, o_{t+3}, \cdots, o_T|i_{t+1}=q_j, \lambda) \cdot a_{ij} \\ & = \sum_{j=1}^N P(o_{t+1}|i_{t+1}=q_j, \lambda)\cdot P(o_{t+2}, o_{t+3}, \cdots, o_T|i_{t+1}=q_j, \lambda) \cdot a_{ij} \\ & = \sum_{j=1}^N b_j(o_{t+1})\cdot \beta_{t+1}(j)\cdot a_{ij} \end{aligned}

贝叶斯公式

P ( X , Y Z ) = P ( X Y , Z ) P ( Y Z ) P(X,Y|Z) = P(X|Y, Z)\cdot P(Y|Z)

前向算法求解步骤
(1)初值
β T ( i ) = 1 , , i = 1 , 2 ,   , T \beta_{T}(i) = 1, \quad, i=1,2,\cdots, T

(2)迭代
β t ( i ) = j = 1 N a i j b j ( o t + 1 ) β t + 1 ( j ) , , i = 1 , 2 ,   , N \beta_t(i) = \sum_{j=1}^N a_{ij} b_j(o_{t+1})\beta_{t+1}(j), \quad, i = 1,2,\cdots, N

(3)终止
P ( O λ ) = i = 1 N π i b i ( o 1 ) β 1 ( i ) P(O|\lambda) = \sum_{i=1}^N \pi_i b_i(o_1) \beta_1(i)

2.4 统一概率计算公式

P ( O λ ) = P ( o 1 , o 2 ,   , o T λ ) P(O|\lambda) = P(o_1, o_2, \cdots, o_T | \lambda) ,知
P ( O λ ) = i = 1 N P ( o 1 ,   , o t , i t = q i , o t + 1 ,   , o T , λ ) = i = 1 N P ( o t + 1 ,   , o T o 1 ,   , o t , i t = q t , λ ) P ( o 1 ,   , o t , i t = q t λ ) = i = 1 N P ( o t + 1 ,   , o T i t = q t , λ ) P ( o 1 ,   , o t , i t = q t λ ) = i = 1 N α t ( i ) β t ( i ) = i = 1 N j = 1 N α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) , t = 1 , 2 ,   , T 1 \begin{aligned} P(O|\lambda) & = \sum_{i=1}^NP(o_1, \cdots, o_t, i_t=q_i, o_{t+1}, \cdots, o_T, |\lambda)\\ & = \sum_{i=1}^NP(o_{t+1}, \cdots, o_T | o_1, \cdots, o_t , i_t= q_t, \lambda)\cdot P(o_1, \cdots, o_t , i_t= q_t |\lambda) \\ & = \sum_{i=1}^N P(o_{t+1}, \cdots, o_T|i_t=q_t, \lambda)\cdot P(o_1, \cdots, o_t, i_t=q_t | \lambda) \\ & = \sum_{i=1}^N\alpha_t(i)\beta_t(i) = \sum_{i=1}^N\sum_{j=1}^N\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j), \quad t = 1, 2, \cdots, T-1 \end{aligned}

上式当 t = T 1 t=T-1 t = 1 t=1 时分别表示前向和后向概率计算公式。

α t ( i ) = P ( o 1 ,   , o t , i t = q t λ ) \alpha_t(i) = P(o_1, \cdots, o_t, i_t=q_t | \lambda) β t ( i ) = P ( o t + 1 ,   , o T i t = q t , λ ) \beta_t(i) = P(o_{t+1}, \cdots, o_T|i_t=q_t, \lambda) 易推得
α t ( i ) β t ( i ) = P ( o 1 ,   , o t , i t = q t λ ) P ( o t + 1 ,   , o T i t = q t , λ ) = P ( o 1 ,   , o t , i t = q t λ ) P ( o t + 1 ,   , o T o 1 ,   , o t , i t = q t λ ) = P ( o 1 ,   , o T , i t = q t λ ) = P ( i t = q t , O λ ) \begin{aligned} \alpha_t(i) \cdot \beta_t(i) & = P(o_1, \cdots, o_t, i_t=q_t | \lambda)\cdot P(o_{t+1}, \cdots, o_T|i_t=q_t, \lambda) \\ & = P(o_1, \cdots, o_t, i_t=q_t | \lambda)\cdot P(o_{t+1}, \cdots, o_T|o_1, \cdots, o_t, i_t=q_t \lambda) \\ & = P(o_1, \cdots, o_T, i_t=q_t|\lambda) = P(i_t=q_t, O|\lambda) \end{aligned}

2.5 一些概率与期望值的计算

(1)给定模型 λ \lambda 和观测序列 O O ,在时刻 t t 处于状态 q i q_i 的概率,记作
γ t ( i ) = P ( i t = q i O , λ ) = P ( i t = q i , O λ ) P ( O λ ) = α t ( i ) β t ( i ) j = 1 N α t ( j ) β t ( j ) \gamma_t(i) = P(i_t =q_i | O, \lambda) = \frac{P(i_t=q_i,O | \lambda)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\sum_{j=1}^N\alpha_t(j)\beta_t(j)}

(2)给定模型 λ \lambda 和观测序列 O O ,在时刻 t t 处于状态 q i q_i 且在时刻 t + 1 t+1 处于状态 q j q_j 的概率 ,记作
ξ t ( i , j ) = P ( i t = q i , i t + 1 = q j O , λ ) = P ( i t = q i , i t + 1 = q j O , λ ) i = 1 N j = 1 N P ( i t = q i , i t + 1 = q j , O λ ) \xi_t(i, j) = P(i_t=q_i, i_{t+1}=q_j|O, \lambda) = \frac{P(i_t=q_i, i_{t+1}=q_j|O, \lambda)}{\sum_{i=1}^N\sum_{j=1}^NP(i_t=q_i, i_{t+1}=q_j, O|\lambda)}

其中, P ( i t = q i , i t + 1 = q j , O λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) P(i_t=q_i, i_{t+1}=q_j, O|\lambda)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)

3. 学习算法

3.1 监督学习方法

已知 S S 个长度相同的观测序列和对应的状态序列 { ( O 1 , I 1 )   , ( O s , I s ) } \{(O_1, I_1), \cdots, (O_s, I_s)\} ,则可用极大似然估计法来估计HMM参数。

(1)状态转移概率估计
设样本从隐藏状态 q i q_i 转移到 q j q_j 的频数为 A i j A_{ij} ,则状态转移矩阵
A = [ a ^ i j ] , a ^ i j = A i j k = 1 N A i k A=[\hat a_{ij}], \quad \hat a_{ij}=\frac{A_{ij}}{\sum_{k=1}^NA_{ik}}

(2)观测状态生成概率估计
设样本隐藏状态为 q j q_j 且观测状态为 v i v_i 的频数为 B j i B_{ji} ,则观测概率矩阵
B = [ b ^ j ( i ) ] , b ^ j ( k ) = B j i k = 1 N B j k B=[\hat b_j(i)], \quad \hat b_j(k)=\frac{B_{ji}}{\sum_{k=1}^NB_{jk}}

(3)初始状态概率估计
设样本中初始隐藏状态为 q i q_i 的频数为 C ( i ) C(i),则初始概率向量
Π = π ( i ) = C ( i ) k = 1 N C ( k ) \Pi=\pi(i)=\frac{C(i)}{\sum_{k=1}^N C(k)}

3.2 Baum-Welch算法

已知 S S 个长度为 T T 的观测序列 { O 1 , O 2 ,   , O s } \{O_1, O_2, \cdots, O_s\} ,隐藏状态 I I 未知,HMM为含有隐变量的概率模型,即
P ( O λ ) = I P ( O I , λ ) P ( I λ ) P(O|\lambda)=\sum_IP(O|I, \lambda)\cdot P(I|\lambda)

可使用EM算法学习模型参数,即在 E E 步求出联合分布 P ( O , I λ ) P(O, I|\lambda) 基于条件概率的期望 P ( I O , λ ) P(I|O, \overline \lambda) λ \overline \lambda 为当前模型参数,然后再使用 M M 步,更新版模型参数,直至模型参数收敛。

在E步,当前模型参数为 λ \overline\lambda ,联合部分 P ( O , I λ ) P(O, I|\lambda) 基于条件概率 P ( I O , λ ) P(I|O, \overline \lambda) 的期望
L ( λ , λ ) = I P ( I O , λ ) log P ( O , I λ ) L(\lambda, \overline\lambda) = \sum_IP(I|O, \overline\lambda) \log P(O, I|\lambda)

在M步,极大化上式,得到更新后的模型参数
λ = arg max λ I P ( I O , λ ) log P ( O , I λ ) \overline \lambda = \arg \max_{\lambda} \sum_IP(I|O, \overline\lambda) \log P(O, I | \lambda)

4 预测算法

4.1 近似算法

思想:在时刻 t t 选择该时刻最有可能出现的状态 i t i_t^* ,从而得到状态序列 I = ( i 1 , i 2 ,   , i T ) I^* = (i_1^*, i_2^*, \cdots, i_T^*) 作为预测结果。
给定HMM模型 λ \lambda 和观测 O O ,时刻 t t 处于状态 q i q_i 的概率
γ t ( i ) = P ( i t = q i O , λ ) = α t ( i ) β t ( i ) j = 1 N α t ( j ) β t ( j ) \gamma_t(i)=P(i_t=q_i | O, \lambda) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j=1}^N\alpha_t(j)\beta_t(j)}

在每一时刻 t t 最有可能的状态
i t = arg max 1 i N [ γ t ( i ) ] , t = 1 , 2 ,   , T i_t^*=\arg\max_{1\leq i\leq N}[\gamma_t(i)], \quad t = 1, 2, \cdots, T

从而得到状态序列 I = ( i 1 , i 2 ,   , i T ) I^* = (i_1^*, i_2^*, \cdots, i_T^*) 。近似算法不能保证预测的状态序列整体上是最可能的状态序列,近似算法未考虑相邻状态的转移概率,因此可能出现相邻状态的转移概率为0的情况。

4.2 维特比算法

维特比算法是基于动态规划解HMM的状态预测问题,即基于动态规划求最大概率路径,路径对应于隐藏状态序列。
如果最优路径在时刻 t t 通过节点 i t i_t^* ,则最优路径从节点 i t i_t^* 到节点 i T i_T^* 的部分路径,对于从 i t i_t^* i T i_T^* 所有可能的路径中,一定是最优的。

定义时刻 t t 状态为 i i 的所有单个路径 ( i 1 , i 2 ,   , i t ) (i_1, i_2, \cdots, i_t) 中概率最大值为
δ t ( i ) = max i 1 , i 2 ,   , i t 1 P ( i t = i , i t 1 ,   , i 1 o t ,   , o 1 λ ) , i = 1 , 2 ,   , N \delta_t(i) = \max_{i_1, i_2, \cdots, i_{t-1}}P(i_t=i, i_{t-1}, \cdots, i_1, o_t, \cdots, o_1|\lambda), \quad i=1,2,\cdots,N

递推公式
δ t + 1 ( i ) = max i 1 , i 2 ,   , i t P ( i t + 1 = i , i t ,   , i 1 , o t + 1 ,   , o 1 λ ) = max i 1 , i 2 ,   , i t P ( i t + 1 = i , o t + 1 i t ,   , i 1 , o t ,   , o 1 , λ ) P ( i t = i , i t 1 ,   , i 1 o t ,   , o 1 λ ) = max i 1 , i 2 ,   , i t P ( i t + 1 = i , o t + 1 i t , λ ) δ t ( i ) = max i 1 , i 2 ,   , i t P ( i t + 1 = i i t , o t + 1 , λ ) P ( o t + 1 i t , λ ) δ t ( i ) = max i 1 , i 2 ,   , i t P ( i t + 1 = i i t , λ ) P ( o t + 1 i t , λ ) δ t ( i ) = max 1 j N [ δ t ( j ) a j i ] b i ( o t + 1 ) , i = 1 , 2 , , N ; t = 1 , 2 ,   , T 1 \begin{aligned} \delta_{t+1}(i) & =\max_{i_1, i_2, \cdots, i_t} P(i_{t+1}=i, i_t, \cdots, i_1, o_{t+1}, \cdots, o_1|\lambda) \\ & = \max_{i_1, i_2, \cdots, i_t} P(i_{t+1}=i, o_{t+1} |i_t, \cdots, i_1, o_t, \cdots, o_1, \lambda) \cdot P(i_t=i, i_{t-1}, \cdots, i_1, o_t, \cdots, o_1|\lambda)\\ & = \max_{i_1, i_2, \cdots, i_t} P(i_{t+1}=i, o_{t+1} |i_t, \lambda) \cdot \delta_t(i) \\ & = \max_{i_1, i_2, \cdots, i_t} P(i_{t+1}=i|i_t, o_{t+1}, \lambda) \cdot P(o_{t+1}|i_t, \lambda)\cdot \delta_t(i)\\ & = \max_{i_1, i_2, \cdots, i_t} P(i_{t+1}=i|i_t,\lambda) \cdot P(o_{t+1}|i_t, \lambda)\cdot \delta_t(i) \\ & = \max_{1\leq j \leq N} [\delta_t(j)a_{ji}]b_i(o_{t+1}), \quad i=1,2,\dots,N; t= 1,2,\cdots,T-1 \end{aligned}

定义时刻 t t 状态为 i i 的所有单个路径 ( i 1 , i 2 ,   , i t 1 , i ) (i_1, i_2, \cdots, i_{t-1}, i) 中概率重大的路径的第 t 1 t-1 个节点
ψ t ( i ) = arg max 1 j N [ δ t 1 ( j ) a j i ] , i = 1 , 2 ,   , N \psi_t(i) = \arg\max_{1\leq j \leq N}[\delta_{t-1}(j)a_{ji}], \quad i = 1, 2, \cdots, N

维特比算法求解步骤
输入:模型 λ = ( A , B , Π ) \lambda=(A, B, \Pi) 和观测 O = ( o 1 , o 2 ,   , o T ) O=(o_1, o_2, \cdots, o_T)
输出:最优路径 I = ( i 1 , i 2 ,   , i T ) I^*=(i_1^*, i_2^*, \cdots, i_T^*)
(1)初始化
δ 1 ( i ) = π i b i ( o i ) , ψ 1 ( i ) = 0 , i = 1 , 2 ,   , N \delta_1(i) = \pi_ib_i(o_i),\quad \psi_1(i)=0,\quad i=1,2,\cdots,N

(2)递推, t = 2 , 3 ,   , N t=2,3,\cdots,N
δ t ( i ) = max 1 j N [ δ t 1 a j i ] b i ( o t ) , ψ t ( i ) = arg max 1 j N [ δ t 1 ( j ) a j i ] , i = 1 , 2 ,   , N \delta_t(i)=\max_{1\leq j \leq N}[\delta_{t-1}a_{ji}]b_i(o_t), \quad \psi_t(i)=\arg\max_{1\leq j \leq N}[\delta_{t-1}(j)a_{ji}], \quad i = 1,2,\cdots, N

(3)终止
P = max 1 j N δ T ( i ) , i T = arg max 1 i N [ δ T ( i ) ] P^*=\max_{1\leq j \leq N}\delta_T(i), \quad i_T^*=\arg\max_{1\leq i \leq N}[\delta_T(i)]

(4)最优路径回溯,对 t = T 1 , T 2 ,   , 1 t=T-1, T-2, \cdots,1
i t = ψ t + 1 ( i t + 1 ) i_t^*=\psi_{t+1}(i_{t+1}^*)

4.3 维特比算法求解实例

考虑HMM模型 λ = ( A , B , Π ) \lambda = (A, B, \Pi) ,状态集合 Q = { 1 , 2 , 3 } Q = \{1, 2, 3\} ,观测集合 V = { , } V = \{红, 白\} ,且三要素
A = [ 0.5 0.5 0.3 0.3 0.5 0.2 0.2 0.3 0.5 ] , B = [ 0.5 0.5 0.4 0.6 0.7 0.3 ] , π = ( 0.2 , 0.4 , 0.4 ) T A = \left[\begin{matrix} 0.5 &0.5 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right] ,\quad B = \left[\begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \end{matrix}\right] ,\quad \pi=(0.2, 0.4, 0.4)^T

已知观测序列 O = ( , , ) O=(红, 白, 红) ,试求最优状态序列。

解:
(1)初始化,求时刻 t = 1 t=1 时,每一个隐藏状态 q i q_i 观测到红色的概率
δ 1 ( 1 ) = 0.2 0.5 = 0.1 , δ 1 ( 2 ) = 0.4 0.4 = 0.16 , δ 1 ( 3 ) = 0.4 0.7 = 0.28 , ψ 1 ( i ) = 0 \delta_1(1)=0.2*0.5=0.1, \quad \delta_1(2)=0.4*0.4=0.16, \quad \delta_1(3)=0.4*0.7=0.28, \quad \psi_1(i)=0

(2)迭代计算
t = 2 t=2 时,在 t = 1 t=1 时状态为 j j 观测为红并在 t = 2 t=2 时状态为 i i 观测为白的最大概率
δ 2 ( 1 ) = max 1 j 3 [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max { 0.1 0.5 , 0.16 0.3 , 0.28 0.2 } 0.5 = 0.028 , ψ 2 ( 1 ) = 3 \delta_2(1)=\max_{1\leq j \leq 3}[\delta_1(j)a_{j1}]b_1(o_2) = \max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5 = 0.028, \quad \psi_2(1)=3

同理 δ 2 ( 2 ) = 0.0504 , ψ 2 ( 2 ) = 3 , δ 2 ( 3 ) = 0.042 , ψ 2 ( 3 ) = 3 \delta_2(2)=0.0504, \psi_2(2)=3, \delta_2(3)=0.042, \psi_2(3)=3

t = 3 t=3 时, δ 3 ( 1 ) = 0.00756 , ψ 3 ( 1 ) = 2 , δ 3 ( 2 ) = 0.01008 , ψ 3 ( 2 ) = 2 , δ 3 ( 3 ) = 0.0147 , ψ 3 ( 3 ) = 3 \delta_3(1)=0.00756, \psi_3(1)=2, \delta_3(2)=0.01008, \psi_3(2)=2, \delta_3(3)=0.0147, \psi_3(3)=3

(3)最优概率路径
P = max 1 i 3 δ 3 ( i ) = 0.0147 P^* = \max_{1\leq i \leq 3} \delta_3(i)=0.0147

因此 i 3 = 3 i_3^* = 3 ,且 i 2 = ψ 3 ( i 3 ) = 3 i_2^* = \psi_3(i_3^*)=3 i 1 = ψ 2 ( i 2 ) = 3 i_1^* = \psi_2(i_2^*)=3 ,故最优状态序列 I = ( i 1 , i 2 , i 3 ) = ( 3 , 3 , 3 ) I^*=(i_1^*, i_2^*, i_3^*)=(3,3,3)

猜你喜欢

转载自blog.csdn.net/sinat_34072381/article/details/83448902