白板推导系列Pytorch-隐马尔可夫模型-概率计算问题

HMM 博客汇总

概率计算问题(Evaluation)

给定模型 λ = ( A , B , π ) \lambda=(A, B, \pi) λ=(A,B,π) 和观测序列 O = ( o 1 , o 2 , ⋯   , o T ) O=\left(o_{1}, o_{2}, \cdots, o_{T}\right) O=(o1,o2,,oT) , 计算在模型 λ \lambda λ 下观测序列 O O O 出现的概率 P ( O ∣ λ ) P(O \mid \lambda) P(Oλ).

直接计算法

P ( O ∣ λ ) = ∑ I P ( O , I ∣ λ ) = ∑ I P ( O ∣ I , λ ) ⋅ P ( I ∣ λ ) P(O|\lambda) = \sum_{I}P(O,I|\lambda) = \sum_{I}P(O|I,\lambda)\cdot P(I|\lambda) P(Oλ)=IP(O,Iλ)=IP(OI,λ)P(Iλ)

前向算法

定义前向概率为
α t ( i ) = P ( o 1 , o 2 , . . . , o t , i t = q i ∣ λ ) \alpha_t(i) = P(o_1,o_2,...,o_t,i_t=q_i|\lambda) αt(i)=P(o1,o2,...,ot,it=qiλ)
那么我们要求的
P ( O ∣ λ ) = P ( o 1 , o 2 , . . . , o T ∣ λ ) = ∑ i N P ( o 1 , o 2 , . . . , o T , i T = q i ∣ λ ) = ∑ i α T ( i ) P(O|\lambda) = P(o_1,o_2,...,o_T|\lambda) = \sum_{i}^NP(o_1,o_2,...,o_T,i_T=q_i|\lambda) = \sum_{i}\alpha_T(i) P(Oλ)=P(o1,o2,...,oTλ)=iNP(o1,o2,...,oT,iT=qiλ)=iαT(i)

算法过程:

(1)初值
α 1 ( i ) = P ( o 1 , i 1 = q i ∣ λ ) = P ( i 1 ∣ λ ) ⋅ P ( o 1 ∣ i 1 = q i , λ ) = π i b i ( o 1 ) \alpha_1(i) = P(o_1,i_1=q_i|\lambda) = P(i_1|\lambda)\cdot P(o_1|i_1=qi,\lambda) = \pi_i b_i(o_1) α1(i)=P(o1,i1=qiλ)=P(i1λ)P(o1i1=qi,λ)=πibi(o1)
(2)递推
α t + 1 ( i ) = P ( o 1 , o 2 , . . . , o t , o t + 1 , i t + 1 = q i ∣ λ ) = P ( o 1 , o 2 , . . . , o t , i t + 1 = q i ∣ λ ) ⋅ P ( o t + 1 ∣ o 1 , o 2 , . . . , o t , i t + 1 = q i ∣ λ ) = [ ∑ j = 1 P ( o 1 , o 2 , . . . , o t , i t = q j , i t + 1 = q i ∣ λ ) ] ⋅ P ( o t + 1 ∣ i t + 1 = q i ) = [ ∑ j = 1 P ( o 1 , o 2 , . . . , o t , i t = q j ∣ λ ) ⋅ P ( i t + 1 = q i ∣ o 1 , o 2 , . . . , o t , i t = q j , λ ) ] ⋅ b i ( o t + 1 ) = [ ∑ j = 1 α t ( j ) ⋅ P ( i t + 1 = q i ∣ i t = q j , λ ) ] ⋅ b i ( o t + 1 ) = [ ∑ j = 1 α t ( j ) ⋅ a j i ] ⋅ b i ( o t + 1 ) \begin{aligned} \alpha_{t+1}(i) &= P(o_1,o_2,...,o_t,o_{t+1},i_{t+1}=q_i|\lambda) \\ &= P(o_1,o_2,...,o_t,i_{t+1}=q_i|\lambda)\cdot P(o_{t+1}|o_1,o_2,...,o_t,i_{t+1} = q_i|\lambda) \\ &= [\sum_{j=1} P(o_1,o_2,...,o_t,i_t=q_j,i_{t+1}=q_i|\lambda)]\cdot P(o_{t+1}|i_{t+1}=q_i) \\ &= [\sum_{j=1}P(o_1,o_2,...,o_t,i_t=q_j|\lambda)\cdot P(i_{t+1}=q_i|o_1,o_2,...,o_t,i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot P(i_{t+1}=q_i|i_t=q_j,\lambda)]\cdot b_i(o_{t+1}) \\ &= [\sum_{j=1}\alpha_t(j)\cdot a_{ji}]\cdot b_i(o_{t+1}) \end{aligned} αt+1(i)=P(o1,o2,...,ot,ot+1,it+1=qiλ)=P(o1,o2,...,ot,it+1=qiλ)P(ot+1o1,o2,...,ot,it+1=qiλ)=[j=1P(o1,o2,...,ot,it=qj,it+1=qiλ)]P(ot+1it+1=qi)=[j=1P(o1,o2,...,ot,it=qjλ)P(it+1=qio1,o2,...,ot,it=qj,λ)]bi(ot+1)=[j=1αt(j)P(it+1=qiit=qj,λ)]bi(ot+1)=[j=1αt(j)aji]bi(ot+1)
(3)终止
P ( O ∣ λ ) = ∑ i α T ( i ) P(O|\lambda) = \sum_{i}\alpha_T(i) P(Oλ)=iαT(i)

算法实现

定义观测数据和 λ \lambda λ参数( π , A , B \pi,A,B π,A,B

import numpy as np

O = [1,1,0,0,1]
Pi= [0.25,0.25,0.25,0.25]
A = [
    [0, 1, 0, 0 ],
    [0.4,0,0.6,0],
    [0,0.4,0,0.6],
    [0,0,0.5,0.5]
]
B = [
    [0.5,0.5],
    [0.3,0.7],
    [0.6,0.4],
    [0.8,0.2]
]

定义模型

class ForwardModel:
    def __init__(self,Pi,A,B) -> None:
        self.Pi = Pi
        self.A = A
        self.B = B
        self.T = len(A)
    def predict(self,O):
        alpha = np.zeros(shape=(self.T,),dtype=float)
        # 初值
        for i in range(self.T):
            alpha[i] = self.Pi[i]*self.B[i][O[0]]
        # 递推
        for observation in O:
            temp = np.zeros_like(alpha)
            for i in range(self.T):
                for j in range(self.T):
                    temp[i] += alpha[j]*self.A[j][i]
                temp[i] = temp[i]*self.B[i][observation]
            alpha = temp
        # 终止
        return np.sum(alpha)

预测

model = ForwardModel(Pi,A,B)
model.predict(O)

0.01164323808

读者可自行验证是否正确

后向算法

定义后向概率为
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , λ ) \beta_{t}(i)=P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) βt(i)=P(ot+1,ot+2,,oTit=qi,λ)
那么
P ( O ∣ λ ) = P ( o 1 , o 2 , . . . , o T ∣ λ ) = ∑ i = 1 N P ( o 1 , o 2 , . . . , o T , i 1 = q i ∣ λ ) = ∑ i = 1 N P ( o 1 ∣ o 2 , . . . , o T , i 1 = q i , λ ) ⋅ P ( o 2 , . . . , o T , i 1 = q i ∣ λ ) = ∑ i = 1 N P ( o 1 ∣ i 1 = q i , λ ) ⋅ P ( o 2 , . . . , o T ∣ i 1 = q i , λ ) ⋅ P ( i 1 = q i ∣ λ ) = ∑ i = 1 N b i ( o 1 ) ⋅ β 1 ( i ) ⋅ π i \begin{aligned} P(O|\lambda) &= P(o_1,o_2,...,o_T|\lambda) \\ &=\sum_{i=1}^{N}P(o_1,o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|o_2,...,o_T,i_1=q_i,\lambda)\cdot P(o_2,...,o_T,i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}P(o_1|i_1=q_i,\lambda)\cdot P(o_2,...,o_T|i_1=q_i,\lambda)\cdot P(i_1=q_i|\lambda) \\ &=\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(Oλ)=P(o1,o2,...,oTλ)=i=1NP(o1,o2,...,oT,i1=qiλ)=i=1NP(o1o2,...,oT,i1=qi,λ)P(o2,...,oT,i1=qiλ)=i=1NP(o1i1=qi,λ)P(o2,...,oTi1=qi,λ)P(i1=qiλ)=i=1Nbi(o1)β1(i)πi

算法过程

(1)初值
β T ( i ) = 1 \beta_T(i) = 1 βT(i)=1

(2)递推
β t ( i ) = P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t = q i , λ ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T , i t + 1 = q j ∣ i t = q i , λ ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j , i t = q i , λ ) ⋅ P ( i t + 1 = q j ∣ i t = q i , λ ) = ∑ j = 1 N P ( o t + 1 , o t + 2 , ⋯   , o T ∣ i t + 1 = q j , i t = q i , λ ) ⋅ α i j \begin{aligned} \beta_{t}(i) &= P\left(o_{t+1}, o_{t+2}, \cdots, o_{T} \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T},i_{t+1}=q_j \mid i_{t}=q_{i}, \lambda\right) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot P(i_{t+1}=q_j|i_t=q_i,\lambda) \\ &=\sum_{j=1}^N P\left(o_{t+1}, o_{t+2}, \cdots, o_{T}\mid i_{t+1}=q_j,i_{t}=q_{i}, \lambda\right)\cdot \alpha_{ij} \end{aligned} βt(i)=P(ot+1,ot+2,,oTit=qi,λ)=j=1NP(ot+1,ot+2,,oT,it+1=qjit=qi,λ)=j=1NP(ot+1,ot+2,,oTit+1=qj,it=qi,λ)P(it+1=qjit=qi,λ)=j=1NP(ot+1,ot+2,,oTit+1=qj,it=qi,λ)αij
接下来这个步骤很关键,根据概率图模型。。。(挖个坑,这儿差个证明)
β t ( i ) = ∑ j = 1 N P ( o t + 1 , . . . o T ∣ i t + 1 = q j , λ ) ⋅ α i j = ∑ j = 1 N P ( o t + 2 , . . . o T ∣ i t + 1 = q j , λ ) ⋅ P ( o t + 1 ∣ o t + 2 , . . . o T , i t + 1 = q j , λ ) ⋅ α i j = ∑ j = 1 N β t + 1 ( j ) ⋅ b j ( o t + 1 ) ⋅ α i j \begin{aligned} \beta_t(i) &= \sum_{j=1}^{N}P(o_{t+1},...o_T|i_{t+1}=q_j,\lambda)\cdot \alpha_{ij} \\ &=\sum_{j=1}^{N}P(o_{t+2},...o_T|i_{t+1}=q_j,\lambda)\cdot P(o_{t+1}|o_{t+2},...o_T,i_{t+1}=q_j,\lambda) \cdot \alpha_{ij} \\ &= \sum_{j=1}^{N}\beta_{t+1}(j)\cdot b_j(o_{t+1})\cdot \alpha_{ij} \end{aligned} βt(i)=j=1NP(ot+1,...oTit+1=qj,λ)αij=j=1NP(ot+2,...oTit+1=qj,λ)P(ot+1ot+2,...oT,it+1=qj,λ)αij=j=1Nβt+1(j)bj(ot+1)αij
(3)终止
P ( O ∣ λ ) = ∑ i = 1 N b i ( o 1 ) ⋅ β 1 ( i ) ⋅ π i \begin{aligned} P(O|\lambda) =\sum_{i=1}^{N}b_i(o_1)\cdot \beta_1(i)\cdot \pi_i \end{aligned} P(Oλ)=i=1Nbi(o1)β1(i)πi

算法实现

数据和参数定义部分与上面一致,略

定义模型

class BackwardModel:
    def __init__(self,Pi,A,B) -> None:
        self.Pi = Pi
        self.A = A
        self.B = B
        self.T = len(A)

    def predict(self,O):
        # 初值
        beta = np.ones(shape=(self.T,))
        # 递推
        for o in reversed(O):
            temp = np.zeros_like(beta)
            for i in range(self.T):
                for j in range(self.T):
                    temp[i] += beta[j]*B[j][o]*A[i][j]
            beta = temp
        # 终止
        res = 0
        for i in range(self.T):
            res += B[i][O[0]]*beta[i]*Pi[i]
        return res

预测

model = BackwardModel(Pi,A,B)
model.predict(O)

0.01164323808

我们看到这跟前面使用前向算法得出的预测值是相同的

猜你喜欢

转载自blog.csdn.net/qq_41335232/article/details/121289521