通信模型
发送者(人或机器)发送信息时,需通过媒介(空气或电线)传播信号,此过程为广义上的编码。接收者根据规则将信号还原成发送者发送的信息,此过程为广义上的解码.
语音识别是接收方根据接收信号还原发送方的信息的过程,如何通过观测信号
o
1
,
o
2
,
⋯
o_1,o_2,\cdots
o 1 , o 2 , ⋯ ,来分析信号源发送的信息
s
1
,
s
2
,
⋯
s_1,s_2,\cdots
s 1 , s 2 , ⋯ 呢?从概率角度来看,就是从所有源信息中找到最可能产出观测信号的源信息。
根据贝叶斯定理
P
(
s
1
,
s
2
,
⋯
∣
o
1
,
o
2
,
⋯
)
=
P
(
o
1
,
o
2
,
⋯
∣
s
1
,
s
2
,
⋯
)
P
(
s
1
,
s
2
,
⋯
)
P
(
o
1
,
o
2
,
⋯
)
P(s_1,s_2,\cdots|o_1,o_2,\cdots)=\frac{P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)}{P(o_1,o_2,\cdots)}
P ( s 1 , s 2 , ⋯ ∣ o 1 , o 2 , ⋯ ) = P ( o 1 , o 2 , ⋯ ) P ( o 1 , o 2 , ⋯ ∣ s 1 , s 2 , ⋯ ) P ( s 1 , s 2 , ⋯ )
一旦信息
o
1
,
o
2
,
⋯
o_1,o_2,\cdots
o 1 , o 2 , ⋯ 产生后就不会改变,即
P
(
o
1
,
o
2
,
⋯
)
P(o_1,o_2,\cdots)
P ( o 1 , o 2 , ⋯ ) 为常数,最可能的源信息
s
1
,
s
2
,
⋯
=
arg
max
s
1
,
s
2
,
⋯
P
(
s
1
,
s
2
,
⋯
∣
o
1
,
o
2
,
⋯
)
=
arg
max
s
1
,
s
2
,
⋯
P
(
o
1
,
o
2
,
⋯
∣
s
1
,
s
2
,
⋯
)
P
(
s
1
,
s
2
,
⋯
)
s_1,s_2,\cdots =\arg\max_{s_1,s_2,\cdots}P(s_1,s_2,\cdots|o_1,o_2,\cdots)= \arg\max_{s_1,s_2,\cdots}P(o_1,o_2,\cdots|s_1,s_2,\cdots)P(s_1,s_2,\cdots)
s 1 , s 2 , ⋯ = arg s 1 , s 2 , ⋯ max P ( s 1 , s 2 , ⋯ ∣ o 1 , o 2 , ⋯ ) = arg s 1 , s 2 , ⋯ max P ( o 1 , o 2 , ⋯ ∣ s 1 , s 2 , ⋯ ) P ( s 1 , s 2 , ⋯ )
这个公式可由隐含马尔可夫模型求解。
马尔可夫假设和马尔可夫过程
观测序列
s
1
,
s
2
,
⋯
,
s
t
,
⋯
s_1,s_2,\cdots,s_t,\cdots
s 1 , s 2 , ⋯ , s t , ⋯ 是每天最高气温序列,
s
t
s_t
s t 为气温随机变量。假设随机过程中状态
s
t
s_t
s t 的概率分布只与它的前一个状态相关(今天的最高气温仅与昨天的最高气温有关),即
P
(
s
t
∣
s
1
,
s
2
,
⋯
,
s
t
−
1
)
=
P
(
s
t
∣
s
t
−
1
)
P(s_t|s_1,s_2,\cdots,s_{t-1})=P(s_t|s_{t-1})
P ( s t ∣ s 1 , s 2 , ⋯ , s t − 1 ) = P ( s t ∣ s t − 1 )
该假设称为马尔可夫假设
,符合马尔可夫假设的随机过程称为马尔可夫过程(有向图-贝叶斯网络)
。
1.0
0.6
0.3
0.4
0.7
m1
m2
m3
m4
随机选择一个状态作为初始状态,随后依据转移规则生成后续状态,经
T
T
T 时间后,产生状态序列
s
1
,
⋯
,
s
T
s_1,\cdots,s_T
s 1 , ⋯ , s T 。若时间足够长,从
m
i
m_i
m i 到
m
j
m_j
m j 的转移概率为
#
(
m
i
,
m
j
)
/
#
(
m
i
)
\#(m_i,m_j)/\#(m_i)
# ( m i , m j ) / # ( m i ) 。
隐马尔可夫模型和通信模型
隐马尔可夫模型,描述由马尔可夫链生成不可观测的状态序列,再由状态序列生成观测序列的过程。 隐含的状态序列
s
1
,
s
2
,
⋯
s_1,s_2,\cdots
s 1 , s 2 , ⋯ 是一个典型的马尔可夫链,这种模型称为“隐含”马尔可夫模型。
隐马尔可夫模型的两个假设:
独立输出假设: HMM在每个时刻
t
t
t 输出一个观测
o
t
o_t
o t 仅与隐状态
s
t
s_t
s t 相关::
P
(
o
t
∣
s
1
,
⋯
,
s
t
,
o
1
,
⋯
,
o
t
−
1
)
=
P
(
o
t
∣
s
t
)
P(o_t|s_1,\cdots,s_{t},o_1,\cdots,o_{t-1})=P(o_t|s_{t})
P ( o t ∣ s 1 , ⋯ , s t , o 1 , ⋯ , o t − 1 ) = P ( o t ∣ s t )
马尔可夫假设: HMM在每个时刻
t
t
t 的隐状态
s
t
s_t
s t 仅与上一时刻隐状态
s
t
−
1
s_{t-1}
s t − 1 有关:
P
(
s
t
∣
s
1
,
⋯
,
s
t
−
1
,
o
1
,
⋯
,
o
t
−
1
)
=
P
(
s
t
∣
s
t
−
1
)
P(s_t|s_1,\cdots,s_{t-1},o_1,\cdots,o_{t-1})=P(s_t|s_{t-1})
P ( s t ∣ s 1 , ⋯ , s t − 1 , o 1 , ⋯ , o t − 1 ) = P ( s t ∣ s t − 1 )
根据马尔可夫假设 和独立输出假设 ,状态序列和观测序列的联合概率(生成式模型)
P
(
s
1
,
s
2
,
⋯
,
o
1
,
o
2
,
⋯
)
=
∏
t
P
(
s
t
∣
s
t
−
1
)
⋅
P
(
o
t
∣
s
t
)
P(s_1,s_2,\cdots,o_1,o_2,\cdots)=\prod_tP(s_t|s_{t-1})\cdot P(o_t|s_t)
P ( s 1 , s 2 , ⋯ , o 1 , o 2 , ⋯ ) = t ∏ P ( s t ∣ s t − 1 ) ⋅ P ( o t ∣ s t )
通信解码问题可用HMM解决,利用Viterbi算法找到上面概率的最大值,进而找到最可能的隐藏状态.
HMM模型表示
令隐藏状态集合
M
=
{
m
1
,
⋯
,
m
N
}
M = \{m_1,\cdots, m_N\}
M = { m 1 , ⋯ , m N } ,观测状态集合
V
=
{
v
1
,
⋯
,
v
M
}
V = \{v_1, \cdots, v_M\}
V = { v 1 , ⋯ , v M } ,隐藏状态序列
S
=
(
s
1
,
⋯
,
s
T
)
S = (s_1, \cdots, s_T)
S = ( s 1 , ⋯ , s T ) ,观测状态序列
O
=
(
o
1
,
⋯
,
o
T
)
O = (o_1, \cdots, o_T)
O = ( o 1 , ⋯ , o T ) 。
I. 状态转移矩阵 若时刻
t
t
t 处于隐藏状态
m
i
m_i
m i ,时刻
t
+
1
t+1
t + 1 处于隐藏状态为
m
j
m_j
m j ,则时刻
t
t
t 到时刻
t
+
1
t+1
t + 1 的状态转移概率
a
i
j
=
P
(
s
t
+
1
=
m
j
∣
s
t
=
m
i
)
,
i
,
j
=
1
,
2
,
⋯
,
N
a_{ij} = P(s_{t+1} = m_j | s_t = m_i), \quad i,j = 1, 2, \cdots, N
a i j = P ( s t + 1 = m j ∣ s t = m i ) , i , j = 1 , 2 , ⋯ , N
状态转移矩阵
A
=
[
a
i
j
]
N
×
N
A = [a_{ij}]_{N \times N}
A = [ a i j ] N × N .
II. 观测概率矩阵 若时刻
t
t
t 处于隐藏状态
m
j
m_j
m j ,则从隐藏状态
m
j
m_j
m j 到观测状态
v
k
v_k
v k 的生成概率
b
j
(
k
)
=
P
(
o
t
=
v
k
∣
s
t
=
m
j
)
,
k
=
1
,
2
,
⋯
,
M
;
j
=
1
,
2
,
⋯
,
N
b_j(k) = P(o_t = v_k | s_t = m_j), \quad k = 1,2,\cdots, M; \, j = 1, 2, \cdots, N
b j ( k ) = P ( o t = v k ∣ s t = m j ) , k = 1 , 2 , ⋯ , M ; j = 1 , 2 , ⋯ , N
观测概率矩阵
B
=
[
b
j
(
k
)
]
N
×
M
B = [b_j(k)]_{N\times M}
B = [ b j ( k ) ] N × M .
III. 初始状态概率向量 若初始时刻
t
=
1
t=1
t = 1 处于状态
m
i
m_i
m i 的概率
π
i
=
P
(
s
1
=
m
i
)
,
i
=
1
,
2
,
⋯
,
N
\pi_i = P(s_1 = m_i), \quad i = 1, 2, \cdots, N
π i = P ( s 1 = m i ) , i = 1 , 2 , ⋯ , N
初始状态概率向量
Π
=
(
π
i
)
\Pi = (\pi_i)
Π = ( π i ) .
综上,
π
\pi
π 和
A
A
A 决定状态序列,
B
B
B 决定观测序列,HMM的三元组
表示为
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ = ( A , B , Π )
示例 :假设有
4
4
4 个盒子,每盒都装有红白两种颜色的球,如下
盒子 X
1
2
3
红球数
5
4
7
白球数
5
6
3
依初始概率随机选取1个盒子,从中抽出1个球再放回,然后转移到下一个盒子,如盒子1的转移概率为
P
(
X
=
1
∣
X
=
1
)
=
0.5
,
P
(
X
=
2
∣
X
=
1
)
=
0.2
,
P
(
X
=
3
∣
X
=
1
)
=
0.3
P(X=1|X=1)=0.5,\quad P(X=2|X=1)=0.2,\quad P(X=3|X=1)=0.3
P ( X = 1 ∣ X = 1 ) = 0 . 5 , P ( X = 2 ∣ X = 1 ) = 0 . 2 , P ( X = 3 ∣ X = 1 ) = 0 . 3
如此重复进行5次,得到球颜色的观测序列
O
=
{
红
,
红
,
白
,
白
,
红
}
O = \{红, 红,白,白,红\}
O = { 红 , 红 , 白 , 白 , 红 }
例中,盒子序列为隐状态序列,球颜色序列是观测序列已知,HMM三要素:
A
=
[
0.5
0.2
0.3
0.3
0.5
0.2
0.2
0.3
0.5
]
,
B
=
[
0.5
0.5
0.4
0.6
0.7
0.3
]
,
Π
=
(
0.2
,
0.4
,
0.4
)
T
A = \left[\begin{matrix} 0.5 &0.2 &0.3 \\ 0.3 &0.5 &0.2 \\ 0.2 &0.3 &0.5 \end{matrix}\right] ,\quad B = \left[\begin{matrix} 0.5 &0.5 \\ 0.4 &0.6 \\ 0.7 &0.3 \end{matrix}\right] ,\quad \Pi=(0.2, 0.4, 0.4)^T
A = ⎣ ⎡ 0 . 5 0 . 3 0 . 2 0 . 2 0 . 5 0 . 3 0 . 3 0 . 2 0 . 5 ⎦ ⎤ , B = ⎣ ⎡ 0 . 5 0 . 4 0 . 7 0 . 5 0 . 6 0 . 3 ⎦ ⎤ , Π = ( 0 . 2 , 0 . 4 , 0 . 4 ) T
HMM概率计算
问题描述:已知模型
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ = ( A , B , Π ) 和观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O = ( o 1 , o 2 , ⋯ , o T ) ,计算模型
λ
\lambda
λ 下观测序列
O
O
O 的概率,即
P
(
O
∣
λ
)
P(O|\lambda)
P ( O ∣ λ ) 。
是否可以通过枚举计算观测序列出现的概率? 通过枚举状态序列
S
=
(
s
1
,
s
2
,
⋯
,
s
T
)
S = (s_1, s_2, \cdots, s_T)
S = ( s 1 , s 2 , ⋯ , s T ) ,求解
S
S
S 与观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O = ( o 1 , o 2 , ⋯ , o T ) 的联合概率
P
(
O
,
S
∣
λ
)
P(O, S|\lambda)
P ( O , S ∣ λ ) ,再求和
P
(
O
∣
λ
)
=
∑
S
P
(
O
,
S
∣
λ
)
=
∑
S
P
(
O
∣
S
,
λ
)
P
(
S
∣
λ
)
\begin{aligned} P(O|\lambda) & = \sum_S P(O, S|\lambda) = \sum_{S}P(O|S, \lambda)P(S|\lambda) \end{aligned}
P ( O ∣ λ ) = S ∑ P ( O , S ∣ λ ) = S ∑ P ( O ∣ S , λ ) P ( S ∣ λ )
隐藏状态序列有
N
T
N^T
N T 种组合,直接计算法的复杂度为
O
(
T
N
T
)
O(TN^T)
O ( T N T ) ,不适用于隐含状态较多的模型。
前向递推公式
前向算法是一种DP算法,通过定义局部状态前向概率 得到递推公式,将子问题的最优解扩展到全局问题的最优解。给定模型
λ
\lambda
λ ,在时刻
t
t
t 观测序列为
o
1
,
⋯
,
o
t
o_1, \cdots, o_t
o 1 , ⋯ , o t 且隐藏状态
s
t
=
q
i
s_t=q_i
s t = q i 的概率为前向概率,定义为
α
t
(
i
)
=
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
i
∣
λ
)
,
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
,
α
1
(
i
)
=
π
i
b
i
(
o
1
)
\alpha_t(i) = P(o_1,\cdots,o_t,s_t=q_i|\lambda),\quad \ P(O|\lambda) = \sum_{i}\alpha_T(i),\quad \alpha_1(i) = \pi_i b_i(o_1)
α t ( i ) = P ( o 1 , ⋯ , o t , s t = q i ∣ λ ) , P ( O ∣ λ ) = i ∑ α T ( i ) , α 1 ( i ) = π i b i ( o 1 )
由齐次马尔可夫性
和观测独立性
假设,知前向概率的递推公式为
α
t
+
1
(
i
)
=
P
(
o
1
,
⋯
,
o
t
,
o
t
+
1
,
s
t
+
1
=
q
i
∣
λ
)
=
∑
j
P
(
o
1
,
⋯
,
o
t
,
o
t
+
1
,
s
t
=
q
j
,
s
t
+
1
=
q
i
∣
λ
)
=
∑
j
P
(
s
t
+
1
=
q
i
,
o
t
+
1
∣
o
1
,
⋯
,
o
t
,
s
t
=
q
j
,
λ
)
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
j
∣
λ
)
=
∑
j
P
(
s
t
+
1
=
q
i
,
o
t
+
1
∣
s
t
=
q
j
,
λ
)
α
t
(
j
)
=
∑
j
P
(
o
t
+
1
∣
s
t
=
q
j
,
s
t
+
1
=
q
i
,
λ
)
P
(
s
t
+
1
=
q
i
∣
s
t
=
q
j
,
λ
)
α
t
(
j
)
=
[
∑
j
α
t
(
j
)
a
j
i
]
b
i
(
o
t
+
1
)
\begin{aligned} \alpha_{t+1}(i) &=P(o_1,\cdots,o_t,o_{t+1},s_{t+1}=q_i|\lambda)\\[1ex] &=\sum_jP(o_1,\cdots,o_t,o_{t+1},s_t=q_j,s_{t+1}=q_i|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|o_1,\cdots,o_t,s_t=q_j,\lambda)P(o_1,\cdots,o_t,s_t=q_j|\lambda)\\ &=\sum_jP(s_{t+1}=q_i,o_{t+1}|s_t=q_j,\lambda)\alpha_t(j)\\ &=\sum_jP(o_{t+1}|s_t=q_j,s_{t+1}=q_i,\lambda)P(s_{t+1}=q_i|s_t=q_j,\lambda)\alpha_t(j)\\ &=\left[\sum_{j} \alpha_t(j) a_{ji}\right] b_i(o_{t+1}) \end{aligned}
α t + 1 ( i ) = P ( o 1 , ⋯ , o t , o t + 1 , s t + 1 = q i ∣ λ ) = j ∑ P ( o 1 , ⋯ , o t , o t + 1 , s t = q j , s t + 1 = q i ∣ λ ) = j ∑ P ( s t + 1 = q i , o t + 1 ∣ o 1 , ⋯ , o t , s t = q j , λ ) P ( o 1 , ⋯ , o t , s t = q j ∣ λ ) = j ∑ P ( s t + 1 = q i , o t + 1 ∣ s t = q j , λ ) α t ( j ) = j ∑ P ( o t + 1 ∣ s t = q j , s t + 1 = q i , λ ) P ( s t + 1 = q i ∣ s t = q j , λ ) α t ( j ) = [ j ∑ α t ( j ) a j i ] b i ( o t + 1 )
基于状态序列的路径结构 递推计算
P
(
O
∣
λ
)
P(O|\lambda)
P ( O ∣ λ ) ,通过保存子问题的解以避免重复计算,达到计算加速的目的。
矩阵形式为
α
1
=
π
⊙
B
o
1
,
α
t
+
1
=
(
α
t
T
A
)
⊙
B
o
t
+
1
\boldsymbol\alpha_1=\boldsymbol\pi\odot\boldsymbol B_{o_1},\ \boldsymbol\alpha_{t+1}=(\boldsymbol\alpha_t^TA)\odot\boldsymbol B_{o_{t+1}}
α 1 = π ⊙ B o 1 , α t + 1 = ( α t T A ) ⊙ B o t + 1 ,最后迭代得到
α
T
(
i
)
\alpha_T(i)
α T ( i ) ,因此
P
(
O
∣
λ
)
=
∑
i
α
T
(
i
)
P(O|\lambda)=\sum_{i}\alpha_T(i)
P ( O ∣ λ ) = i ∑ α T ( i )
若模型
λ
\lambda
λ 含
N
N
N 个隐藏状态,观测序列
O
O
O 的长度为
T
T
T ,则
P
(
O
∣
λ
)
P(O|\lambda)
P ( O ∣ λ ) 的时间复杂度为
O
(
N
2
T
)
O(N^2T)
O ( N 2 T ) 。
Python实现
import numpy as np
def forward_HMM ( O, PI, A, B) :
"""
已知模型,求解状态序列概率
:param O: 1D, 观测序列(元素为整数)
:param PI: 1D, 初始概率向量
:param A: 2D, 状态转移矩阵
:param B: 2D, 观测生成矩阵
:return: float, O的概率
"""
PI = np. asarray( PI) . ravel( )
A = np. asarray( A)
B = np. asarray( B)
alphas = B[ : , O[ 0 ] ] * PI
for index in O[ 1 : ] :
alphas = np. dot( alphas, A) * B[ : , index]
return alphas. sum ( )
if __name__ == '__main__' :
PI = [ 0.2 , 0.4 , 0.4 ]
A = [ [ 0.5 , 0.2 , 0.3 ] , [ 0.3 , 0.5 , 0.2 ] , [ 0.2 , 0.3 , 0.5 ] ]
B = [ [ 0.5 , 0.5 ] , [ 0.4 , 0.6 ] , [ 0.7 , 0.3 ] ]
O = [ 0 , 1 , 0 ]
print ( forward_HMM( O, PI, A, B) )
后向递推公式
给定模型
λ
\lambda
λ ,在时刻
t
t
t 隐藏态为
q
i
q_i
q i 且时刻
t
+
1
t+1
t + 1 之后观测序列为
o
t
+
1
,
⋯
,
o
T
o_{t+1}, \cdots, o_T
o t + 1 , ⋯ , o T 的概率为后向概率 ,即
β
t
(
i
)
=
P
(
o
t
+
1
,
o
t
+
2
,
⋯
,
o
T
∣
s
t
=
q
i
,
λ
)
,
P
(
O
∣
λ
)
=
∑
i
π
i
b
i
(
o
1
)
β
1
(
i
)
,
β
T
(
i
)
=
1
\beta_t(i) = P(o_{t+1},o_{t+2},\cdots,o_T|s_t = q_i, \lambda),\quad P(O|\lambda) = \sum_{i}\pi_i b_i(o_1) \beta_1(i),\quad \beta_{T}(i) = 1
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯ , o T ∣ s t = q i , λ ) , P ( O ∣ λ ) = i ∑ π i b i ( o 1 ) β 1 ( i ) , β T ( i ) = 1
由齐次马尔可夫性
和观测独立性
假设,知后向概率的递推公式
β
t
(
i
)
=
∑
j
P
(
o
t
+
1
,
⋯
,
o
T
,
s
t
+
1
=
q
j
∣
s
t
=
q
i
,
λ
)
=
∑
j
P
(
o
t
+
1
,
⋯
,
o
T
∣
s
t
=
q
i
,
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
s
t
+
1
=
q
j
∣
s
t
=
q
i
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
∣
o
t
+
2
,
⋯
,
o
T
,
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
o
t
+
2
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
P
(
o
t
+
1
∣
s
t
+
1
=
q
j
,
λ
)
⋅
P
(
o
t
+
2
,
⋯
,
o
T
∣
s
t
+
1
=
q
j
,
λ
)
=
∑
j
a
i
j
⋅
b
j
(
o
t
+
1
)
⋅
β
t
+
1
(
j
)
\begin{aligned}\beta_t(i) & = \sum_{j}P(o_{t+1},\cdots,o_T,s_{t+1}=q_j|s_t = q_i, \lambda) \\ & = \sum_{j}P(o_{t+1},\cdots,o_T|s_t = q_i,s_{t+1}=q_j, \lambda)\cdot P(s_{t+1}=q_j|s_t =q_i, \lambda) \\ & = \sum_{j}a_{ij}\cdot P(o_{t+1},\cdots,o_T| s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|o_{t+2},\cdots,o_T,s_{t+1}=q_j,\lambda)\cdot P(o_{t+2}, \cdots, o_T|s_{t+1}=q_j, \lambda)\\ & = \sum_{j}a_{ij}\cdot P(o_{t+1}|s_{t+1}=q_j,\lambda)\cdot P(o_{t+2},\cdots, o_T|s_{t+1}=q_j, \lambda) \\ & = \sum_{j}a_{ij}\cdot b_j(o_{t+1})\cdot \beta_{t+1}(j) \end{aligned}
β t ( i ) = j ∑ P ( o t + 1 , ⋯ , o T , s t + 1 = q j ∣ s t = q i , λ ) = j ∑ P ( o t + 1 , ⋯ , o T ∣ s t = q i , s t + 1 = q j , λ ) ⋅ P ( s t + 1 = q j ∣ s t = q i , λ ) = j ∑ a i j ⋅ P ( o t + 1 , ⋯ , o T ∣ s t + 1 = q j , λ ) = j ∑ a i j ⋅ P ( o t + 1 ∣ o t + 2 , ⋯ , o T , s t + 1 = q j , λ ) ⋅ P ( o t + 2 , ⋯ , o T ∣ s t + 1 = q j , λ ) = j ∑ a i j ⋅ P ( o t + 1 ∣ s t + 1 = q j , λ ) ⋅ P ( o t + 2 , ⋯ , o T ∣ s t + 1 = q j , λ ) = j ∑ a i j ⋅ b j ( o t + 1 ) ⋅ β t + 1 ( j )
前后向算法之间的关系
P
(
O
∣
λ
)
=
∑
i
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
i
,
o
t
+
1
,
⋯
,
o
T
,
∣
λ
)
=
∑
i
P
(
o
t
+
1
,
⋯
,
o
T
∣
o
1
,
⋯
,
o
t
,
s
t
=
q
t
,
λ
)
⋅
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
t
∣
λ
)
=
∑
i
P
(
o
t
+
1
,
⋯
,
o
T
∣
s
t
=
q
t
,
λ
)
⋅
P
(
o
1
,
⋯
,
o
t
,
s
t
=
q
t
∣
λ
)
=
∑
i
α
t
(
i
)
β
t
(
i
)
=
∑
i
P
(
s
t
=
q
i
,
O
∣
λ
)
\begin{aligned} P(O|\lambda) & = \sum_{i}P(o_1, \cdots, o_t, s_t=q_i, o_{t+1}, \cdots, o_T, |\lambda)\\ & = \sum_{i}P(o_{t+1}, \cdots, o_T | o_1, \cdots, o_t , s_t= q_t, \lambda)\cdot P(o_1, \cdots,o_t,s_t=q_t |\lambda) \\ & = \sum_{i}P(o_{t+1}, \cdots, o_T|s_t=q_t, \lambda)\cdot P(o_1, \cdots, o_t, s_t=q_t | \lambda) \\ & = \sum_{i}\alpha_t(i)\beta_t(i)=\sum_iP(s_t=q_i, O|\lambda) \end{aligned}
P ( O ∣ λ ) = i ∑ P ( o 1 , ⋯ , o t , s t = q i , o t + 1 , ⋯ , o T , ∣ λ ) = i ∑ P ( o t + 1 , ⋯ , o T ∣ o 1 , ⋯ , o t , s t = q t , λ ) ⋅ P ( o 1 , ⋯ , o t , s t = q t ∣ λ ) = i ∑ P ( o t + 1 , ⋯ , o T ∣ s t = q t , λ ) ⋅ P ( o 1 , ⋯ , o t , s t = q t ∣ λ ) = i ∑ α t ( i ) β t ( i ) = i ∑ P ( s t = q i , O ∣ λ )
当
t
=
T
−
1
t=T-1
t = T − 1 和
t
=
1
t=1
t = 1 时,上式分别表示前向和后向概率计算公式.
一些概率计算公式
给定模型
λ
\lambda
λ 和观测序列
O
O
O ,时刻
t
t
t 处于状态
q
i
q_i
q i 的概率,记作
γ
t
(
i
)
=
P
(
s
t
=
q
i
∣
O
,
λ
)
=
P
(
s
t
=
q
i
,
O
∣
λ
)
P
(
O
∣
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
α
t
(
j
)
β
t
(
j
)
\gamma_t(i) = P(s_t =q_i | O, \lambda) = \frac{P(s_t=q_i,O | \lambda)}{P(O|\lambda)}=\frac{\alpha_t(i)\beta_t(i)}{\displaystyle\sum_{j}\alpha_t(j)\beta_t(j)}
γ t ( i ) = P ( s t = q i ∣ O , λ ) = P ( O ∣ λ ) P ( s t = q i , O ∣ λ ) = j ∑ α t ( j ) β t ( j ) α t ( i ) β t ( i )
给定模型
λ
\lambda
λ 和观测序列
O
O
O ,时刻
t
t
t 处于状态
q
i
q_i
q i 且时刻
t
+
1
t+1
t + 1 处于状态
q
j
q_j
q j 的概率 ,记作
ξ
t
(
i
,
j
)
=
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
∣
O
,
λ
)
=
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
,
O
∣
λ
)
∑
i
∑
j
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
,
O
∣
λ
)
\xi_t(i, j) = P(s_t=q_i, s_{t+1}=q_j|O, \lambda) = \frac{P(s_t=q_i, s_{t+1}=q_j,O| \lambda)}{\displaystyle\sum_i\sum_jP(s_t=q_i, s_{t+1}=q_j, O|\lambda)}
ξ t ( i , j ) = P ( s t = q i , s t + 1 = q j ∣ O , λ ) = i ∑ j ∑ P ( s t = q i , s t + 1 = q j , O ∣ λ ) P ( s t = q i , s t + 1 = q j , O ∣ λ )
其中,
P
(
s
t
=
q
i
,
s
t
+
1
=
q
j
,
O
∣
λ
)
=
α
t
(
i
)
a
i
j
b
j
(
o
t
+
1
)
β
t
+
1
(
j
)
P(s_t=q_i, s_{t+1}=q_j, O|\lambda)=\alpha_t(i)a_{ij}b_j(o_{t+1})\beta_{t+1}(j)
P ( s t = q i , s t + 1 = q j , O ∣ λ ) = α t ( i ) a i j b j ( o t + 1 ) β t + 1 ( j ) 。
HMM模型学习
问题描述:给定观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O = ( o 1 , o 2 , ⋯ , o T ) ,求最可能的HMM的
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ = ( A , B , Π ) 。
监督学习方法
若有足够多的标记数据,即已知隐含状态
m
j
m_j
m j 出现的次数
#
(
m
j
)
\#(m_j)
# ( m j ) 、生成观测状态
v
k
v_k
v k 的次数
#
(
v
k
,
m
j
)
\#(v_k,m_j)
# ( v k , m j ) ,则参数估计
a
i
j
≈
#
(
m
i
,
m
j
)
#
(
m
i
)
,
b
j
(
k
)
≈
#
(
v
k
,
m
j
)
#
(
m
j
)
,
π
i
≈
#
(
m
i
)
∑
#
(
m
k
)
a_{ij}\approx\frac{\#(m_i,m_j)}{\#(m_i)},\quad b_j(k)\approx\frac{\#(v_k,m_j)}{\#(m_j)},\quad \pi_i\approx\frac{\#(m_i)}{\displaystyle\sum \#(m_k)}
a i j ≈ # ( m i ) # ( m i , m j ) , b j ( k ) ≈ # ( m j ) # ( v k , m j ) , π i ≈ ∑ # ( m k ) # ( m i )
很多应用不可能做到这件事情,比如语音识别的声学模型训练,人无法确定产生某个语音的状态序列。
期望最大化算法
HMM的概率模型
P
(
O
∣
λ
)
=
∑
S
P
(
O
∣
S
,
λ
)
P
(
S
∣
λ
)
P(O|\lambda)=\sum_SP(O|S, \lambda)P(S|\lambda)
P ( O ∣ λ ) = S ∑ P ( O ∣ S , λ ) P ( S ∣ λ )
EM算法中的Q函数
Q
(
λ
,
λ
′
)
=
∑
S
P
(
S
∣
O
,
λ
′
)
ln
P
(
O
,
S
∣
λ
)
∝
∑
S
P
(
O
,
S
∣
λ
′
)
ln
P
(
O
,
S
∣
λ
)
Q(\lambda, \lambda')=\sum_SP(S|O,\lambda')\ln P(O,S|\lambda)\propto\sum_S P(O,S|\lambda')\ln P(O,S|\lambda)
Q ( λ , λ ′ ) = S ∑ P ( S ∣ O , λ ′ ) ln P ( O , S ∣ λ ) ∝ S ∑ P ( O , S ∣ λ ′ ) ln P ( O , S ∣ λ ) 根据状态序列和观测序列的联合分布 (下标
i
j
i_j
i j 表示任意隐状态编号)
P
(
O
,
S
∣
λ
)
=
π
i
1
b
i
1
(
o
1
)
a
i
1
i
2
b
i
2
(
o
2
)
⋯
a
i
T
−
1
i
T
b
i
T
(
o
T
)
P(O,S|\lambda)=\pi_{i_1}b_{i_1}(o_1)a_{i_1i_2}b_{i_2}(o_2)\cdots a_{i_{T-1}i_T}b_{i_T}(o_T)
P ( O , S ∣ λ ) = π i 1 b i 1 ( o 1 ) a i 1 i 2 b i 2 ( o 2 ) ⋯ a i T − 1 i T b i T ( o T ) 得
Q
(
λ
,
λ
′
)
=
∑
S
P
(
O
,
S
∣
λ
′
)
ln
π
i
1
+
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
−
1
a
i
t
i
t
+
1
+
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
b
i
t
(
o
t
)
Q(\lambda, \lambda')=\sum_SP(O,S|\lambda')\ln\pi_{i_1}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}+ \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{i_t}(o_t) \\
Q ( λ , λ ′ ) = S ∑ P ( O , S ∣ λ ′ ) ln π i 1 + S ∑ P ( O , S ∣ λ ′ ) ln t = 1 ∑ T − 1 a i t i t + 1 + S ∑ P ( O , S ∣ λ ′ ) ln t = 1 ∑ T b i t ( o t ) 式中
∑
S
P
(
O
,
S
∣
λ
′
)
ln
π
i
1
=
∑
i
P
(
O
,
s
1
=
q
i
∣
λ
′
)
ln
π
i
,
∑
i
π
i
=
1
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
−
1
a
i
t
i
t
+
1
=
∑
i
∑
j
P
(
O
,
s
t
=
q
i
,
s
t
+
1
=
q
j
∣
λ
′
)
∑
t
=
1
T
−
1
ln
a
i
j
∑
S
P
(
O
,
S
∣
λ
′
)
ln
∑
t
=
1
T
b
i
t
(
o
t
)
=
∑
i
P
(
O
,
i
t
=
i
∣
λ
′
)
ln
∑
i
=
1
T
b
i
(
o
t
)
\begin{aligned} & \sum_SP(O,S|\lambda')\ln \pi_{i_1}=\sum_iP(O,s_1=q_i|\lambda')\ln\pi_{i},\quad\sum_i\pi_i=1\\ &\sum_SP(O,S|\lambda')\ln\sum_{t=1}^{T-1}a_{i_{t}i_{t+1}}=\sum_i\sum_jP(O,s_t=q_i,s_{t+1}=q_j|\lambda')\sum_{t=1}^{T-1}\ln a_{ij}\\ & \sum_SP(O,S|\lambda')\ln\sum_{t=1}^Tb_{it}(o_t)=\sum_iP(O,i_t=i|\lambda')\ln\sum_{i=1}^Tb_i(o_t) \end{aligned}
S ∑ P ( O , S ∣ λ ′ ) ln π i 1 = i ∑ P ( O , s 1 = q i ∣ λ ′ ) ln π i , i ∑ π i = 1 S ∑ P ( O , S ∣ λ ′ ) ln t = 1 ∑ T − 1 a i t i t + 1 = i ∑ j ∑ P ( O , s t = q i , s t + 1 = q j ∣ λ ′ ) t = 1 ∑ T − 1 ln a i j S ∑ P ( O , S ∣ λ ′ ) ln t = 1 ∑ T b i t ( o t ) = i ∑ P ( O , i t = i ∣ λ ′ ) ln i = 1 ∑ T b i ( o t ) 对
π
i
\pi_i
π i ,
a
i
j
a_{ij}
a i j ,
b
j
(
k
)
b_j(k)
b j ( k ) 的偏导为0得(根据上节概率计算公式)
π
i
=
P
(
O
,
s
1
=
q
i
∣
λ
′
)
P
(
O
∣
λ
′
)
=
γ
1
(
i
)
,
a
i
j
=
∑
i
=
1
T
−
1
ξ
t
(
i
,
j
)
∑
i
=
1
T
−
1
γ
t
(
i
)
,
b
j
(
k
)
=
∑
t
=
1
,
o
t
=
v
k
T
γ
t
(
j
)
∑
t
=
1
T
γ
t
(
j
)
\pi_i = \frac{P(O, s_1=q_i|\lambda')}{P(O|\lambda')}=\gamma_1(i),\quad a_{ij}=\frac{\sum_{i=1}^{T-1}\xi_t(i,j)}{\sum_{i=1}^{T-1}\gamma_t(i)},\quad b_j(k)=\frac{\sum_{t=1,o_t=v_k}^T\gamma_t(j)}{\sum_{t=1}^T\gamma_t(j)}
π i = P ( O ∣ λ ′ ) P ( O , s 1 = q i ∣ λ ′ ) = γ 1 ( i ) , a i j = ∑ i = 1 T − 1 γ t ( i ) ∑ i = 1 T − 1 ξ t ( i , j ) , b j ( k ) = ∑ t = 1 T γ t ( j ) ∑ t = 1 , o t = v k T γ t ( j )
HMM预测/解码
给定模型
λ
=
(
A
,
B
,
Π
)
\lambda=(A,B,\Pi)
λ = ( A , B , Π ) 和观测序列
O
=
(
o
1
,
o
2
,
⋯
,
o
T
)
O = (o_1, o_2, \cdots, o_T)
O = ( o 1 , o 2 , ⋯ , o T ) ,求最可能的隐藏状态序列
S
S
S ,即
P
(
S
∣
O
,
λ
)
P(S|O, \lambda)
P ( S ∣ O , λ ) .
贪心近似算法
给定
λ
\lambda
λ 和观测序列
O
O
O ,时刻
t
t
t 处于状态
q
i
q_i
q i 的概率
γ
t
(
i
)
=
P
(
s
t
=
q
i
∣
O
,
λ
)
=
α
t
(
i
)
β
t
(
i
)
∑
j
α
t
(
j
)
β
t
(
j
)
\gamma_t(i)=P(s_t=q_i | O, \lambda) = \frac{\alpha_t(i)\beta_t(i)}{\sum_{j}\alpha_t(j)\beta_t(j)}
γ t ( i ) = P ( s t = q i ∣ O , λ ) = ∑ j α t ( j ) β t ( j ) α t ( i ) β t ( i ) 每个时刻t选择最可能出现的状态
s
t
∗
s_t^*
s t ∗ ,从而得到状态序列
S
∗
S^*
S ∗ ,即
S
∗
=
(
s
1
∗
,
s
2
∗
,
⋯
)
,
s
t
∗
=
q
k
=
arg
max
k
γ
t
(
k
)
S^*=(s_1^*,s_2^*,\cdots),\quad s_t^*=q_k = \arg\max_k\gamma_t(k)
S ∗ = ( s 1 ∗ , s 2 ∗ , ⋯ ) , s t ∗ = q k = arg k max γ t ( k )
维特比算法
DP思想:最优路径中的部分路径也一定是最优的。设观测序列
o
1
,
⋯
,
o
t
o_1,\cdots,o_t
o 1 , ⋯ , o t 下状态
s
t
=
q
i
s_t=q_i
s t = q i 的所有路径中概率最大值为
δ
t
(
i
)
=
max
i
P
(
s
t
=
q
i
,
s
t
−
1
,
⋯
,
s
1
,
o
t
,
⋯
,
o
1
∣
λ
)
\delta_t(i) = \max_{i}P(s_t=q_i, s_{t-1}, \cdots, s_1, o_t, \cdots, o_1|\lambda)
δ t ( i ) = i max P ( s t = q i , s t − 1 , ⋯ , s 1 , o t , ⋯ , o 1 ∣ λ )
递推公式
δ
t
+
1
(
i
)
=
max
j
δ
t
(
j
)
a
j
i
b
i
(
o
t
+
1
)
,
δ
1
(
i
)
=
π
i
b
i
(
o
i
)
\delta_{t+1}(i)=\max_j\delta_t(j)a_{ji}b_i(o_{t+1}),\quad \delta_1(i) = \pi_ib_i(o_i)
δ t + 1 ( i ) = j max δ t ( j ) a j i b i ( o t + 1 ) , δ 1 ( i ) = π i b i ( o i )
定义时刻
t
+
1
t+1
t + 1 状态为
q
i
q_i
q i 的最大概率路径的第
t
t
t 个节点
i
t
=
ψ
t
+
1
(
i
)
=
arg
max
j
δ
t
(
j
)
a
j
i
,
i
T
=
arg
max
i
δ
T
(
i
)
i_{t} = \psi_{t+1}(i) = \arg\max_{j}\delta_{t}(j)a_{ji},\quad i_T=\arg\max_{i}\delta_T(i)
i t = ψ t + 1 ( i ) = arg j max δ t ( j ) a j i , i T = arg i max δ T ( i )
则
P
(
S
∣
O
,
λ
)
=
max
i
δ
T
(
i
)
P(S|O,\lambda)=\max_{i}\delta_T(i)
P ( S ∣ O , λ ) = max i δ T ( i ) .
如图所示
δ
3
(
i
1
)
=
max
{
δ
2
(
i
1
)
a
11
b
1
(
o
3
)
,
δ
2
(
i
2
)
a
21
b
1
(
o
3
)
,
δ
2
(
i
3
)
a
31
b
1
(
o
3
)
}
\delta_3(i_1)=\max\{\delta_2(i_1)a_{11}b_{1}(o_3), \,\,\delta_2(i_2)a_{21}b_1(o_3),\,\, \delta_2(i_3)a_{31}b_1(o_3)\}
δ 3 ( i 1 ) = max { δ 2 ( i 1 ) a 1 1 b 1 ( o 3 ) , δ 2 ( i 2 ) a 2 1 b 1 ( o 3 ) , δ 2 ( i 3 ) a 3 1 b 1 ( o 3 ) } .
示例: 基于第4解模型
λ
=
(
A
,
B
,
Π
)
\lambda = (A, B, \Pi)
λ = ( A , B , Π ) ,已知观测序列
O
=
(
红
,
白
,
红
)
O=(红, 白, 红)
O = ( 红 , 白 , 红 ) ,求最优状态序列。
I. 初始化 时刻
t
=
1
t=1
t = 1 ,每一个隐藏状态
q
i
q_i
q i 观测到红色的概率
δ
1
(
1
)
=
0.2
∗
0.5
=
0.1
,
δ
1
(
2
)
=
0.4
∗
0.4
=
0.16
,
δ
1
(
3
)
=
0.4
∗
0.7
=
0.28
,
ψ
1
(
i
)
=
0
\delta_1(1)=0.2*0.5=0.1, \quad \delta_1(2)=0.4*0.4=0.16, \quad \delta_1(3)=0.4*0.7=0.28, \quad \psi_1(i)=0
δ 1 ( 1 ) = 0 . 2 ∗ 0 . 5 = 0 . 1 , δ 1 ( 2 ) = 0 . 4 ∗ 0 . 4 = 0 . 1 6 , δ 1 ( 3 ) = 0 . 4 ∗ 0 . 7 = 0 . 2 8 , ψ 1 ( i ) = 0
II. 迭代计算 时刻
t
=
2
t=2
t = 2 状态为
q
1
q_1
q 1 观测为白的最大概率
δ
2
(
1
)
=
max
1
≤
j
≤
3
[
δ
1
(
j
)
a
j
1
]
b
1
(
o
2
)
=
max
{
0.1
∗
0.5
,
0.16
∗
0.3
,
0.28
∗
0.2
}
∗
0.5
=
0.028
,
ψ
2
(
1
)
=
3
\delta_2(1)=\max_{1\leq j \leq 3}[\delta_1(j)a_{j1}]b_1(o_2) = \max\{0.1*0.5, 0.16*0.3, 0.28*0.2\}*0.5 = 0.028, \quad \psi_2(1)=3
δ 2 ( 1 ) = 1 ≤ j ≤ 3 max [ δ 1 ( j ) a j 1 ] b 1 ( o 2 ) = max { 0 . 1 ∗ 0 . 5 , 0 . 1 6 ∗ 0 . 3 , 0 . 2 8 ∗ 0 . 2 } ∗ 0 . 5 = 0 . 0 2 8 , ψ 2 ( 1 ) = 3 同理
δ
2
(
2
)
=
0.0504
,
ψ
2
(
2
)
=
3
;
δ
2
(
3
)
=
0.042
,
ψ
2
(
3
)
=
3
\delta_2(2)=0.0504, \psi_2(2)=3; \, \delta_2(3)=0.042, \psi_2(3)=3
δ 2 ( 2 ) = 0 . 0 5 0 4 , ψ 2 ( 2 ) = 3 ; δ 2 ( 3 ) = 0 . 0 4 2 , ψ 2 ( 3 ) = 3 .
时刻
t
=
3
t=3
t = 3 状态为
q
j
q_j
q j 观测为红的最大概率
δ
3
(
1
)
=
0.00756
,
ψ
3
(
1
)
=
2
,
δ
3
(
2
)
=
0.01008
,
ψ
3
(
2
)
=
2
,
δ
3
(
3
)
=
0.0147
,
ψ
3
(
3
)
=
3.
\delta_3(1)=0.00756,\ \psi_3(1)=2,\ \delta_3(2)=0.01008,\ \psi_3(2)=2,\ \delta_3(3)=0.0147,\ \psi_3(3)=3.
δ 3 ( 1 ) = 0 . 0 0 7 5 6 , ψ 3 ( 1 ) = 2 , δ 3 ( 2 ) = 0 . 0 1 0 0 8 , ψ 3 ( 2 ) = 2 , δ 3 ( 3 ) = 0 . 0 1 4 7 , ψ 3 ( 3 ) = 3 .
III. 最优概率路径
P
∗
=
max
1
≤
i
≤
3
δ
3
(
i
)
=
0.0147
P^* = \max_{1\leq i \leq 3} \delta_3(i)=0.0147
P ∗ = 1 ≤ i ≤ 3 max δ 3 ( i ) = 0 . 0 1 4 7 因此
i
3
=
3
i_3 = 3
i 3 = 3 ,
i
2
=
ψ
3
(
i
3
)
=
3
i_2 = \psi_3(i_3)=3
i 2 = ψ 3 ( i 3 ) = 3 ,
i
1
=
ψ
2
(
i
2
)
=
3
i_1 = \psi_2(i_2)=3
i 1 = ψ 2 ( i 2 ) = 3 ,最优状态序列
I
=
(
i
1
,
i
2
,
i
3
)
=
(
3
,
3
,
3
)
I=(i_1, i_2, i_3)=(3,3,3)
I = ( i 1 , i 2 , i 3 ) = ( 3 , 3 , 3 ) .
隐藏状态序列
s
=
(
s
1
,
⋯
,
s
n
)
\boldsymbol s=(s_1, \cdots, s_n)
s = ( s 1 , ⋯ , s n ) ,观测序列
o
=
(
o
1
,
⋯
,
o
n
)
\boldsymbol o=(o_1, \cdots, o_n)
o = ( o 1 , ⋯ , o n ) .
HMM局限
HMM建模联合概率分布
λ
=
P
(
S
,
O
)
\lambda=P(S, O)
λ = P ( S , O ) ,解码/预测问题是找到状态序列
s
\boldsymbol s
s ,使得
P
(
s
∣
o
,
λ
)
P(\boldsymbol s|\boldsymbol o, λ)
P ( s ∣ o , λ ) 最大。
HMM中,
s
i
s_i
s i 仅依赖
s
i
−
1
s_{i-1}
s i − 1 ,
o
i
o_i
o i 依赖
s
i
s_i
s i ,若观测序列通过很多特征刻画,比如NER任务中标注
s
i
s_i
s i 不仅依赖
o
i
o_i
o i ,还依赖前后标注
o
j
(
j
≠
i
)
o_j(j\neq i)
o j ( j = i ) ,如周围观测的大小写、词性等特征,则HMM模型不能处理该类任务。