Three Problems and Solutions of Hidden Markov Models

This paper mainly introduces the Hidden Markov Model and the solutions to the three major problems in the model.

The Hidden Markov Model is a statistical model for dealing with sequence problems. The process described is: a random sequence of unobservable states is randomly generated by a hidden Markov chain , and then each state generates an observation, thereby generating a random sequence of observations.

In this process, the unobservable sequence is called the state sequence, and the resulting sequence is called the observation sequence.

The process can be described by the following diagram:

In the above figure, $X_1,X_2,…X_T$ are implicit sequences, and $O_1, O_2,..O_T$ are observation sequences.

Hidden Markov Models are determined by three probabilities:

  1. The initial probability distribution , that is, the probability distribution of the initial hidden state, is recorded as $\pi$;
  2. State transition probability distribution , that is, the transition probability distribution between hidden states, denoted as $A$;
  3. The observation probability distribution , that is, the probability distribution of the observed state generated from the hidden state, is denoted as $B$.

The above three probability distributions can be said to be the parameters of the hidden Markov model, and according to these three probabilities, a hidden Markov model $\lambda = (A, B, \pi)$ can be determined.

The three basic problems of hidden Markov chains are:

  1. Probability calculation problem . That is, given the model $\lambda = (A, B, \pi)$ and the observation sequence $O$, calculate the maximum probability $P(O|\lambda)$ of the observation sequence under the model $\lambda$;
  2. learning problems . That is, given the observation sequence $O$, estimate the parameter $\lambda$ of the model, so that the probability of the occurrence of the observation sequence under this parameter is the largest, that is, $P(O|\lambda)$ is the largest;
  3. decoding problem . Given the model $\lambda = (A, B, \pi)$ and the observation sequence $O$, calculate the implicit sequence $X$ that is most likely to produce this observation sequence, even if the probability P(X|O, \lambda ) largest implicit sequence $X$.

 

1. Probability calculation problem

The probability calculation problem can theoretically be solved by the exhaustive method, that is, exhaust all possible implicit state sequences, and then calculate the probability of generating observation sequences from all possible implicit sequences, assuming that the length of the observation sequence is $n$, and each The possible hidden state length corresponding to each observation state is $m$, then the time complexity of this method is $O(m^n)$, which is obviously unacceptable, so in practice it is often Instead of using this method, forward algorithm and backward algorithm are used. Both forward algorithm and backward algorithm reduce the time complexity of calculation through dynamic programming. The difference between the two is that the direction of calculation is different.

1.1 Forward Algorithm

The forward algorithm needs to define the forward probability first:

The forward probability is defined as the observation sequence of $o_1, o_2, o_3...o_t$ until time $t$, and the hidden state of time $t$ is the $i$th among all the hidden states (denoted as $ q_i$), the forward probability can be written as
$$\alpha_t(i) = P(o_1,o_2,o_3…o_t,x_t = q_i|\lambda)$$

Once the forward probability is defined, the forward probability and the probability of the observation sequence $P(o_1,o_2,o_3…o_t|\lambda)$ can be solved recursively.


The initial state is:
$$\alpha_1(i) = \pi_ib_i(o_1)~~i=1,..m$$$m$
in the above formula represents the number of hidden states, $\pi_i$ represents each hidden state The initial probability of the containing state, $b_i(o_1)$ represents the probability that the $i$ hidden state generates the observed state $o_1$.


The recursive formula is:
$$\alpha_{t+1}(i) = (\sum_{j=1}^N \alpha_t(j) a_{ji} )b_i(o_{t+1})~ ~i=1,..m
$$$a_{ji}$ in the above formula represents the state transition probability from the implicit state $j$ to the implicit state $i$. The process of forward recursion formula can be seen more intuitively through the following figure:

 

The final calculated probability is (where $T$ is the length of the observation sequence):
$$P(O|\lambda) = \sum_{i=1}^{m}\alpha_T(i)$$

1.2 Backward Algorithm

Similar to the forward algorithm, the backward algorithm can also be used to solve this problem, but the direction is from back to front. Similarly, the backward probability needs to be defined first:

Backward probability means that the hidden state of time $t$ is the $i$th in all hidden states (denoted as $q_i$), and the observation sequence from time $t+1$ to $T$ is $o_{ The probability of t+1}, o_{t+2},….o_T$ is written as:
$$\beta_t(i) = P(o_{t+1}, o_{t+2},….o_T, x_t = q_i|\lambda)$$

The initial state is defined as:
$$\beta_T(i) = 1~~i=1,2,…m$$
The reason why the probability is 1 here is that it is necessary to see what is behind the moment $T$, but because at the end There is no time after a moment $T$, that is, there is no need to observe something, so you can give any state.

The recurrence formula is:
$$\beta_t(i) = \sum_{j=1}^ma_{ij}b_j(o_{t+1})\beta_{t+1}(j)~~i=1, 2,…m$$

The symbols in the above formula are the same as those in the forward algorithm, and the process can be more intuitively understood through the following figure:


The final calculated probability is:
$$P(O|\lambda) = \sum_{i=1}^m \pi_ib_i(o_1)\beta_1(i)$$

The analysis shows that the time complexity of the forward algorithm and the backward algorithm is $O(m^2T)$, $m$ is the number of hidden states, and $T$ is the sequence length.

The problem of learning

The learning problem is to deduce the model parameters according to the observation sequence, and this kind of problem corresponds to the maximum likelihood estimation problem in probability theory. However, there is a maximum likelihood estimation of hidden variables here, so it cannot be solved directly by direct derivation, but this type of problem must be solved by the EM algorithm .

The EM algorithm is a class of algorithms used to solve the maximum likelihood estimation of the parameters of a probability model with hidden variables. The specific algorithm in the Hidden Markov Model is the  Baum-Welch algorithm .

Note: The premise of using the EM algorithm here is that the problem only gives the observation sequence. If the observation sequence and the implicit sequence are given at the same time, it can be solved directly by maximum likelihood estimation.

Only the flow of the Baum-Welch algorithm is given here, and the derivation process is omitted:
1. Initialize the model parameters : select $a_{ij}^{(0)}, b_j^{(0)}, \pi_i^{( 0)}$, get the model $\pi^{(0)} = (A^{(0)}, B^{(0)}, \pi^{(0)})$;
2. Step E , Solve two intermediate variables $\gamma_t(i), \xi_t(i,j)$, the meanings of the two are as follows:
$\gamma_t(i)$: Given the model $\lambda$ and the observation sequence $O$, in The probability that the implicit state of moment $t$ is $q_i$, that is, $\gamma_t(i) = P(x_t = q_i | O, \lambda)$;
$\xi_t(i,j)$: given model$ \lambda$ and observation sequence $O$, the implicit state at time $t$ is $q_i$, and the implicit state at time $t+1$ is the probability of $q_i$, that is, $\xi_t(i,j) = P(x_t = q_i, x_{t+1} = q_j | O, \lambda)$.

Combined with the previous definitions of forward probability and backward probability, the formula for calculating these two intermediate variables is as follows ($m$ represents the total number of hidden states):
$$\gamma_t(i) = \frac{\alpha_t( i) \beta_t(i)}{\sum_{j=1}^m \alpha_t(j) \beta_t(j)}$$
$$\xi_t(i,j) = \frac{\alpha_t(i) a_ {ij} b_j(o_{t+1})\beta_{t+1}(j)}{\sum_{p=1}^m \sum_{q=1}^m \alpha_t(p) a_{pq } b_q(o_{t+1})\beta_{t+1}(q)}$$

3. Step M , at the same time, the two intermediate variables solved by step E are used to solve the parameters of the model. The solution formula is as follows:
$$a_{ij} = \frac{\sum_{t=1}^T \xi_t(i,j )}{\sum_{t=1}^T \gamma_t(i)}$$
$$b_j(k) = \frac{\sum_{t=1}^T \gamma_t(j)I(o_t = v_k) }{\sum_{t=1}^T \gamma_t(j)}$$
$$\pi_i = \gamma_1(i)$$

$I(o_t = v_k)$ in the above formula indicates that when the observation state of $t$ is $v_k$, $I(o_t = v_k)$ is 1, otherwise it is 0.

Step E and M are iteratively performed, and the final convergence result is taken as the parameter of the model.

3. Decoding problem

The decoding problem can also be solved theoretically by the exhaustive method, which is to exhaust all possible implicit sequences and calculate the probability of observing the sequence under this implicit sequence, and select the implicit sequence with the highest probability, but exhaust all possible implicit sequences. The time complexity of the implicit sequence is also exponential. Like the first question, it is often not commonly used in practice.
In practice, the time complexity of the decoding problem is reduced by dynamic programming, and there is already a mature solution, which is the famous Viterbi algorithm.

 

From the blog post: http://wulc.me/2017/07/14/Three major problems of Hidden Markov Model and solution methods/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325133543&siteId=291194637