Hidden Markov Model_Forward and Backward Algorithm_Viterbi Algorithm

1 Introduction

          The Markov model is a memoryless model, that is, the state at time t in the sequence is only related to time t-1 , which is a direct relationship. The so-called hidden Markov models say is t status and time t-1 time indirectly , that is to say there is no direct relationship between the two observed variables, but their hidden variable sequences that are required Markov properties. Many seemingly unrelated things have an implicit relationship. To understand this implicit relationship, you need to find hidden variables and discover the rules. This is indeed a very fascinating theory, he made me feel more possibilities.

2. Hidden Markov Model

        

       Three parameters: The figure above shows the hidden Markov workflow, X as the hidden variable sequence, and Y as the observation sequence. We define the model parameters as: λ=(π,A,B), where π represents the initial state probability distribution, that is, the probability distribution of x at the first moment; A represents the transition matrix, which is the parameter between X; B represents The emission matrix is ​​the parameter between X and Y; with these three parameters, we can express the possibility of all changes in x and y.

      Two hypotheses: homogeneous Markov hypothesis (the state of X at time t is only related to time t-1) and the observational independence hypothesis (yt is only related to xt)

      Three problems: Hidden Markov has three classic problems, corresponding to the three situations it can handle. In summary, it is Evaluation evaluation problem, Learning learning problem, Decoding decoding problem;

        Evaluation is to find the conditional probability of Y under a given λ (that is, use the model parameters to find the observation value); Learning is to find Y for a given Y (that is, given the observation data, learn the model parameters); Decoding is to know λ and Y to find X( That is, the model parameters and observations are known, and the hidden variables are found);

         Next, we will introduce these three issues in detail.

3.Evaluation

         The forward algorithm can be understood as the sum of the forward probability of each state at each time from left to right, and the backward algorithm is the sum of the backward probability of each state at each time from right to left. In the end, the conditional probability distribution of the parameter λ corresponding to the observed value can be expressed. Let's explain in detail below.

1. Forward algorithm

       

        Direct violence solution:

        For the evaluation problem, we know the value of the parameter λ and the observation sequence, and what is required is the probability distribution of the observation sequence under the parameter λ, that is, P(Y|λ). The most direct way is to find P(X|λ) , And then find P(Y|X,λ), we get the joint probability between the hidden variable and the observed value. Then use the marginal probability distribution (when one variable is fixed, the probability of another variable), you can simply find P(Y|λ)

       But this algorithm needs to traverse the possibilities of all combinations of hidden layer sequences. The complexity of the entire algorithm is exponential. Assuming that the number of hidden states is N and the length of the hidden sequence is T, the complexity is that O (TN ^ {T}). To explain the complexity here, there are several combinations of sequences N^{T}, and each sequence needs to be calculated T times. For a small number of hidden units and states, this is acceptable. But when the number of hidden units and states is large, the computational cost is fatal.

        In order to solve this problem, we decided to find another way. So we summoned the forward algorithm.

        Forward algorithm:

        The idea of ​​forward algorithm is very simple. Since the parameter λ is known, we define the hidden variable sequence as I and the observation sequence as O

                      1. The forward probability at the first moment : the forward probability at moment one = π * emission matrix ;

                                       

                     2. The forward probability at the later time: the forward probability at t+1 = the forward probability at time t*state transition matrix*transmission matrix

                                      

                     3. Finally, we add up the forward probabilities at each moment to calculate the probability distribution of the final observation sequence O based on the parameter λ

                                     

         Summary: The idea of ​​forward algorithm is to find the forward probability of each moment from left to right. This is a dynamic programming algorithm that abstracts the relationship between each moment and the forward probability of the previous moment, so that it can be easily The use of iteration represents each moment. Then sum the forward probabilities at all moments to get the final probability distribution P(O|λ); the forward probability at the first moment is multiplied by the emission matrix by π , and the forward probability at each subsequent moment is used before The forward probability at a moment is multiplied by the state transition matrix and then multiplied by the emission matrix. In this way, we avoid the possibility of traversing the left and right combinations of the sequence every time, and the complexity of the entire algorithm is reduced to an exponential decrease compared to the complexity of the brute force solution .O (TN ^ {2})O (TN ^ {T})

2. Backward Algorithm

    

        We just learned about the forward algorithm, which is actually calculating the forward probability from left to right, and then adding up each moment. The so-called backward algorithm is to calculate the backward probability from right to left, and then add up each moment. Like the forward algorithm, it is also a dynamic algorithm that uses the relationship between each moment and the state of the following moments.

       In the forward algorithm, we give the forward probability at time t-1 to calculate the forward probability at time t. The forward probability represents the probability of the observed value under the condition of the hidden variable, then the forward probability at time t represents the probability distribution of the observed value at the previous time t-1 under the condition of the hidden variable at time t . Because it is deduced from front to back, the state π of the hidden variable at the first moment is known, so we can directly perform recursive calculations.

       Backward algorithm to the fact the opposite front, we assume a known time t hidden variables under conditions back to t + 1 at time T the probability distribution of the values observed . Because it is pushed from the back to the front, we need to define a β first to represent the assumption we just made.

                                                       

      Then we show the recurrence relationship from time t+1 to time t β. Corresponding to the figure above, we can explain, it is actually a very simple logic. Since βt+1 represents the conditional probability distribution of all subsequent observations under the condition of t+1. Then multiply βt+1 by the transition matrix to get βt, and multiply by the emission matrix B to get the conditional probability distribution of the observed value at time t+1, and the βt obtained by synthesis can represent the hidden variable at time t, t+1 And the conditional probability distribution of observations at all subsequent moments.

                                                       

       Through recursion, we can get the state of the hidden variable at the first moment, which corresponds to the conditional probability distribution of the observed values ​​at all subsequent moments. Add up all N states to get the probability distribution of the observations corresponding to λ (initial state π, transition matrix, emission matrix).

                                                       

 

        Summary: Compared with the forward algorithm, we introduced the calculation logic of the backward algorithm. The idea is the same as the forward algorithm, which dynamically expresses the relationship between the parameter λ and the observed value through the relationship between the state at the moment. It's just because we push from right to left, so we need to define a variable β to represent our hypothesis. That is, the hidden variable at each moment corresponds to the probability distribution of the observed value at the following moment. Finally, we get the probability distribution of the following observations under the condition of π at the first moment. So when calculating, we need to initialize the value of β at time t. The final algorithm complexity is O (TN ^ {2}).

4. Viterbi algorithm

       Firstly, the decoding problem in the three basic problems of HMM corresponding to the Viterbi algorithm, namely the known model λ(π,A,B) and the observation sequence O; the hidden state sequence S needs to be required. Practical applications such as: speech recognition, known a piece of speech information, infer the most likely corresponding text information;

        Don’t worry, based on my experience of living for so long, the algorithm that often listens to the scarier the name is actually the simpler; the core idea of ​​the Viterbi algorithm is to find the optimal path, which is the most likely hidden state sequence Combination method; how to find it? There is a little trick in this. Suppose you know the optimal path from time t to time t+1 (the maximum probability), then you can get the optimal path before time t; continue to push, assuming you know the penultimate path The optimal path from the second moment to the last moment, then you can derive the global optimal path.

                                

                                         

                    Please look at the two pictures above. Picture one is all possible paths from S to E; picture two is the path processed by Viterbi, which is obviously simpler. Close observation will reveal that the second picture is simple because there is only one path corresponding to a state at each moment. This path can be understood as a local optimal path. When the state E is finally reached, the other paths can be eliminated by tracing backwards, leaving only a global optimal path.

            

5. Summary

         This article describes the hidden Markov model, which introduces hidden variables on the basis of the Markov model to express a sequence. And introduced three classic Markov problems, namely, learning problems, evaluation problems, and decoding problems. Finally, we introduced the forward algorithm and backward algorithm for solving the evaluation problem. I intend to cover the following learning problems and decoding problems in the next blog, because I especially want to watch TV now.

6. Nonsense

          I especially like talking nonsense, because it brings me health. I have always wanted to find a child who stayed by my side, quietly. When I talked to him, his answer was so profound that it shocked me. He doesn't have much interest in the things that other people are pursuing, but rather willing to do things that others find boring. He likes to be in a daze, to talk to himself, and to look at the distant sky and meditate, as if he does not belong to this world. He worked hard to behave like everyone else, hoping that others would think he is a normal child. Only when he is alone will he feel happy. He likes to chat with invisible things, not with words, but with heart. At this time, everything around it seems to be alive. I left, but he always sat on the edge of the field under the setting sun, swinging his legs. I said hi, it's getting dark! Go home quickly! He ran home and fell asleep in his mother's arms. I haven't seen him for a long time, I miss him, but I can't find it anymore.

Tick-Kankan

 

Guess you like

Origin blog.csdn.net/gaobing1993/article/details/108629751
Recommended