HMM and Viterbi algorithm

1. Hidden Markov Model

1. Introduction

  The hidden Markov model was not invented by the Russian mathematician Markov, but proposed by the American mathematician Baum. The training method of the hidden Markov model (Baum-Welch algorithm) is also named after him. of. Hidden Markov model has been considered as the fastest and most effective method to solve most natural language processing problems.

2. Markov Hypothesis

The probability distribution of each state S t  in the random process is only related to its previous state S t-1 , that is, P(S t |S 1 ,S 2 ,S 3 ,…,S t-1 ) = P(S t t |S t-1 ).

  For example, for weather forecasting, it is rigidly assumed that today's temperature is only related to yesterday and has nothing to do with the day before yesterday. Of course, this assumption may not be suitable for all applications, but at least it gives approximate solutions to many problems that were not easy to solve before.

3. Markov chain

  A random process that conforms to the Markov assumption is called a Markov process, also known as a Markov chain.

 

Figure: Markov chain

 

  In this Markov chain, four circles represent four states, each edge represents a possible state transition, and the weight on the edge is the transition probability. Hidden Markov chains are an extension of the above Markov chains: the state S t at any time t is invisible. Therefore, observers cannot infer parameters such as transition probability by observing a state sequence S 1 , S 2 , S 3 ,…, ST . But the hidden Markov model will output a symbol O t at each time t , and O t is related to S t and only related to S t . This is called the independent output assumption. The structure of the hidden Markov model is shown in the figure below, where the hidden states S 1 , S 2 , S 3 ,… are a typical Markov chain. Baum refers to such models as "hidden" Markov models.

 

Figure: Hidden Markov Model

 

 

4. Three basic problems of hidden Markov model

(1) Given a model, how to calculate the probability of a certain output sequence? 

  Forward-Backward algorithm

(2) Given a model and a certain output sequence, how to find the state sequence most likely to produce this output?

  Viterbi Algorithm

(3) Given a sufficient amount of observed data, how to estimate the parameters of the hidden Markov model?

      A more practical way to train the hidden Markov model is to estimate the model parameters P(S t | S t -1 ) and P ( O t |S t ) method (unsupervised training algorithm), in which the Baum-Welch algorithm is mainly used .

 

5. Five-tuple of Hidden Markov Model

HMM is a quintuple (O , Q , O 0 , A , B):

  O:{o 1 ,o 2 ,…,o t } is a set of states, also known as an observation sequence.

  Q:{q 1 ,q 2 ,…,q v } is a set of output results, also called implicit sequence.

  A ij = P(q j |q i ): transition probability distribution

  B ij = P(o j |q i ): emission probability distribution

  O 0 is the initial state, and some have final states.

 

2. Viterbi algorithm (Viterbi)

1. Introduction

  The Viterbi algorithm is a special but most widely used dynamic programming algorithm, which is proposed for the shortest path problem of the directed graph (Lattice) of the fence network. Any problem described by the hidden Markov model can be decoded by the Viterbi algorithm, including today's digital communication, speech recognition, machine translation, pinyin to Chinese characters, word segmentation, etc.

 

Figure: Fence Network

 

2. The basis of the Viterbi algorithm

(1) If the path P with the highest probability (or the shortest path) passes through a certain point, such as X 22 in the figure below , then the sub-path Q from the starting point S to X 22 on this path must be S Shortest route to X 22 . Otherwise, replacing Q with the shortest path R from S to X 22 constitutes a path shorter than P, which is obviously contradictory.

(2) The path from S to E must pass through a certain state at the i-th time, assuming that there are k states at the i-th time, then if the shortest path of all k nodes from S to the i-th state is recorded, the final The shortest path must pass through one of them. In this way, at any moment, only a very limited number of shortest paths need to be considered.

 (3) Combining the above two points, assuming that when we enter state i+1 from state i, the shortest path from S to each node on state i has been found and recorded on these nodes, then when calculating from the starting point S to the previous The shortest path of all k nodes in a state i, and the distance from these k nodes to X i+1 , j are sufficient.

 

3. Summary of Viterbi Algorithm

(1) Starting from point S, for each node in the first state X 1 , it may be assumed that there are n 1 nodes, and calculate the distance d(S,X 1i ) from S to them , where X 1i represents any node in state 1 . Since there is only one step, these distances are the shortest distances from S to each of them.

(2) For all nodes in the second state X2 , the shortest distance from S to them is to be calculated. For a characteristic node X 2i , the path from S to it can pass through any node X 1i in n 1 of state 1 , and the corresponding path length is d(S,X 2i ) = d(S,X 1i ) + d( X 1i ,X 2i ). Since j has n 1 possibilities, we need to calculate one by one to find the minimum value. Right now:

d(S,X2i) = minI=1,n1 d(S,X1i) + d(X1i,X2i)

Thus, for each node in the second state, n 1 multiplication calculations are required. Suppose this state has n 2 nodes

points, calculate the distances of these nodes in S, there will be O(n 1 ·n 2 ) calculations.

(3) Next, similarly follow the above method from the second state to the third state, and go to the last state to get the shortest path from the beginning to the end of the entire grid. The computational complexity of each step is proportional to the product of the number of nodes n i and n i+1 of the adjacent two states S i and S i+1 , namely O(n i ·n i + 1 )

(4) Assuming that the state with the most nodes in this hidden Markov chain has D nodes, that is to say, the width of the entire grid is D, then the complexity of any step does not exceed O(D 2 ), because the grid length is N, so the complexity of the whole Viterbi algorithm is O(N·D 2 ).

 

3. HMM model + Viterbi algorithm example

1. Problem description

Assuming that the humidity of seaweed observed for 3 consecutive days is (Dry, Damp, Soggy), find the most likely weather conditions for these three days.

 

2. Known information

① There are only three types of weather (Sunny, Cloudy, Rainy), and four types of seaweed humidity {Dry, Dryish, Damp, Soggy}, and the humidity of seaweed has a certain relationship with the weather.

②Hidden states: Sunny, Cloudy, Rainy;

③ Observe the state sequence: {Dry, Damp, Soggy}

④Initial state sequence:

Sunny

Cloudy

Rainy

0.63

0.17

0.20

 

 

 

⑤State transition matrix:

 

Sunny

Cloudy

Rainy

Sunny

0.5

0.375

0.125

Cloudy

0.25

0.125

0.625

Rainy

0.25

0.375

0.375

 

 

 

 

 

 

⑥Launch matrix:

 

Dry

Dryish

Damp

Soggy

Sunny

0.6

0.2

0.15

0.05

Cloudy

0.25

0.25

0.25

0.25

Rainy

0.05

0.10

0.35

0.5

 

 

 

 

 

 

 

3. Analysis

  According to the first-order HMM, the weather on Day2 only depends on Day1; the weather on Day3 only depends on the weather on Day2.

 

4. Calculation process

(1) Since Day1 is the initial state, we separately calculate

P(Day1-Sunny)=0.63*0.6;

P(Day1-Cloudy)=0.17*0.25;

P(Day1-Rain)=0.20*0.05;

Choose max{ P(Day1-Sunny), P(Day1-Cloudy), P(Day1-Rainy)}, get the maximum P(Day1-Sunny), and get the maximum probability of Sunny on the first day.

 

(2) The weather on Day 2 depends on the weather conditions on Day 1, and is also affected by the seaweed observed on Day 2.

P(Day2-Sunny)= max{ P(Day1-Sunny)*0.5, P(Day1-Cloudy)*0.25,  P(Day1-Rainy)*0.25} *0.15;

P(Day2-Cloudy)= max{ P(Day1-Sunny)*0.375,  P(Day1-Cloudy)*0.125, P(Day1-Rainy)*0.625} *0.25;

P(Day2-Rainy)= max{ P(Day1-Sunny)*0.125,  P(Day1-Cloudy)*0.625 , P(Day1-Rainy)*0.375} *0.35;

Choosemax{ P(Day2-Sunny), P(Day2-Cloudy), P(Day2-Rainy)}, get the maximum P(Day2-Rainy), and get the maximum probability of Rainy on the second day.

So {Sunny,Rainy} is the most likely weather sequence of the previous two days.

 

(3) The weather on Day3 depends on the weather conditions on Day2, and is also affected by the seaweed observed on Day3.

  P(Day3-Sunny)= max{ P(Day2-Sunny)*0.5, P(Day2-Cloudy)*0.25,  P(Day2-Rainy)*0.25} *0.05;

  P(Day3-Cloudy)= max{ P(Day2-Sunny)*0.375,  P(Day2-Cloudy)*0.125, P(Day2-Rainy)*0.625} *0.25;

  P(Day3-Rainy)= max{ P(Day2-Sunny)*0.125,  P(Day2-Cloudy)*0.625, P(Day2-Rainy)*0.375} *0. 05;

 

  Choosemax{ P(Day3-Sunny), P(Day3-Cloudy), P(Day3-Rainy)}, get the maximum P(Day3-Rainy), and get the maximum probability of Rainy on the 3rd day. So {Sunny,Rainy,Rainy} is the most likely weather sequence for these three days.

Guess you like

Origin blog.csdn.net/pku_Coder/article/details/82627986
HMM