Viterbi algorithm

Viterbi algorithm (Viterbi algorithm) is a very versatile algorithm. I have heard this algorithm when I was in undergraduate communication.  I also saw this algorithm when I looked at the HMM (Hidden Markov model) recently. So I decided to study the principle and specific implementation of this algorithm. Students who understand dynamic programming should easily understand the Viterbi algorithm, because the core of the Viterbi algorithm is dynamic programming.

For HMM, one of the important tasks is to find the implicit sequence that is most likely to generate its observation sequence. In general, an HMM problem can be described by the following five elements:

  1. Observations: Sequences of phenomena that are actually observed
  2. hidden states: all possible hidden states
  3. initial probability (start_probability): the initial probability of each hidden state
  4. Transition probability (transition_probability): The probability of transitioning from one hidden state to another.
  5. Emission probability (emission_probability): the probability that a certain hidden state produces a certain observation phenomenon

Here 's an example from Wikipedia :

Imagine a rural clinic. Villagers have very idealistic traits, either healthy or feverish. They can only know if they have a fever by asking the doctor at the clinic. Smart doctors diagnose whether patients have a fever by asking them how they are feeling. Villagers only answered that they felt normal, dizzy or cold.
Suppose a patient comes to the clinic every day and tells the doctor how he is feeling. Doctors believe that the patient's health is like a discrete Markov chain. There are two states of the patient, "healthy" and "fever", but the doctor cannot directly observe it, which means that the state is "implicit" to him. Every day the patient will tell the doctor that he has one of the following feelings determined by his state of health: normal, cold or dizzy. These are observations. The whole system is a Hidden Markov Model (HMM).
Doctors know the general health of the villagers and what symptoms patients with and without fever usually complain about. In other words, the doctor knows the parameters of the hidden Markov model. Then these five elements mentioned above are represented as follows:

states = ('Healthy', 'Fever')
 
observations = ('normal', 'cold', 'dizzy')
 
start_probability = {'Healthy': 0.6, 'Fever': 0.4}
 
transition_probability = {
   'Healthy' : {'Healthy': 0.7, 'Fever': 0.3},
   'Fever' : {'Healthy': 0.4, 'Fever': 0.6},
   }
 
emission_probability = {
   'Healthy' : {'normal': 0.5, 'cold': 0.4, 'dizzy': 0.1},
   'Fever' : {'normal': 0.1, 'cold': 0.3, 'dizzy': 0.6},
   }

 

The corresponding state transition diagram is as follows:

Now the problem is suppose a patient sees a doctor for three days in a row and the doctor finds that he feels normal on the first day, cold on the second day and dizzy on the third day. The question then arises for doctors: what sequence of health states best explains these observations. The Viterbi algorithm answers this question.

First look at this problem intuitively. In HMM, each state behind an observed phenomenon has a probability value. We only need to select the state with the largest probability value, but this probability value is related to the previous state. (Markov assumption), so each observed phenomenon cannot be considered independently.

In order to compare in terms of time complexity, the problem is now generalized: suppose the length of the observation sequence is m, and the number of hidden states is n. Then there is the following implicit state transition diagram (the diagram below will only draw the diagram for n = 3 for ease of presentation).

If the exhaustive method is used to exhaustively enumerate all possible state sequences and compare their probability values, the time complexity is $O(n^m)$ , obviously such a time complexity is unacceptable, and through Viterbi The algorithm can reduce the time complexity to $O(m*n^2)$.

 Consider this problem from the problem of dynamic programming. According to the definition of the above figure, record last_state as the probability of each hidden state corresponding to the previous observation phenomenon, and curr_state as the probability of each hidden state corresponding to the current observation phenomenon. Then solving for curr_state actually only depends on last_state . And their dependencies can be represented by the following python code:

for cs in states:
    curr_state[cs] = max(last_state[ls] * 
                         transition_probability[ls][cs] *             
                         emission_probability[cs][observation]
                         for ls in states)

The calculation process uses the transition probability transition_probability and the emission probability emission_probability to select the last state ls that is most likely to produce the current state cs .

In addition to the above calculations, a path path is maintained for each implicit state , and path[s] represents the optimal state sequence before reaching state s . After selecting the last state ls that is most likely to produce the current state cs through the previous calculation , insert ls into path[cs] . After traversing all observation sequences in this way, only the state with the largest probability value in curr_state needs to be selected as the final hidden state, and path[state] is taken from path as the state sequence in front of the final hidden state. .

It can be seen from the above analysis that the observation sequence only needs to be traversed once, the time complexity is $O(m)$, and the most likely previous state of each current state is calculated each time, and the time complexity is $O(n^2) $, so the overall time complexity is $O(m*n^2)$.

If HMM is applied in NLP, the word sequence is regarded as the observed phenomenon, and the information such as part of speech and label is regarded as the hidden state, then the hidden state sequence can be solved by the Viterbi algorithm, which is also HMM Applications in word segmentation, part-of-speech tagging, and named entity recognition. The key is often to find the initial probability ( start_probability ), transition probability ( transition_probability ), and emission probability ( emission_probability ) mentioned above.

In the field of communication, if the received coded information is regarded as an observation sequence, and the corresponding decoded information is an implicit state, then the Viterbi algorithm can also find the decoded information with the highest probability.

It should be noted that the Viterbi algorithm is suitable for optimal problems with multiple steps and multiple choices, similar to the following network, which is called "Lattice" in "The Beauty of Mathematics". Each step has multiple choices, and retains the optimal solution of each choice in the previous step, and finds the optimal choice path by backtracking.

 

It is emphasized here that the viterbi algorithm can be used to solve the HMM problem, but it can also be used to solve other problems that fit the description above.

 

From the blog post: http://wulc.me/2017/03/02/Viterbi algorithm/

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325165449&siteId=291194637