[ML-13-3] Hidden Markov Model HMM--Baum-Welch (Baum-Welch)

[ML-13-1 ] Hidden Markov Model HMM

[ML-13-2 ] Hidden Markov Model HMM- forward and backward algorithm 

[ML-13-3 ] Hidden Markov Model HMM--Baum-Welch (Baum- Welch)

[ML-13-4 ] Hidden Markov Model HMM-- Viterbi (Viterbi) Algorithm for Prediction Problems 

table of Contents

  1. Basics-Calculation of HMM Common Probability
  2. Overview of HMM model parameter solution
  3. Principle of Baum-Welch algorithm
  4. Derivation of Baum-Welch algorithm
  5. Baum-Welch algorithm summary

1. Basics- HMM common probability calculation

Using the forward and backward probabilities, we can calculate the probability formula for a single state and two states in the HMM.

1.1 Probability of a single state

Given the model λ and the observation sequence Q, the probability of being in state si at time t is written as:

The significance of a single state probability is mainly used to judge the most likely state at each moment, so that a state sequence can be obtained as the final prediction result.

Using the definition of forward probability and backward probability, we can know:

From the above two expressions:

1.2 Joint probability of two states

Given the model λ and the observation sequence Q, the probability of being in state si at time t and being in state sj at time t + 1 is written as:

1.3 The above two summations can be obtained:

2. Overview of HMM model parameter solution

  In this article we will discuss the problem of solving the parameters of the HMM model, that is, the known observation sequence Q = {q1, q2, ..., qT}, and the estimated parameters of the model λ = (A, B, π), so that in the model The next observation sequence P (Q | λ) is the largest . This problem is the most complicated of the three HMM problems. Before researching this problem, it is recommended to read the first two articles of this series to familiarize yourself with the HMM model and the HMM forward and backward algorithm, as well as the EM algorithm principle summary

HMM model parameter solution can be divided into two cases based on known conditions. The first case is relatively simple, that is, we know D observation sequences of length T and the corresponding hidden state sequences, and directly use the conclusion of the large number theorem "the limit of frequency is probability" to directly give the HMM parameter estimates;

1.1 Assuming that the frequency count of the initial hidden state of qi in all samples is S (i), then the initial probability distribution is:

1.2 Assuming that the frequency count of the sample from the hidden state qi to qj is S ij, then the state transition matrix is ​​obtained as:

1.3 Assuming that the hidden state of the sample is qj and the frequency count of the observation state is vk is q jk, then the probability matrix of the observation state is:

It can be seen that solving the model in the first case is still very simple. But in many cases, we can not get the hidden sequence corresponding to the observation sequence of the HMM sample , only D observation sequences of length T, can we find the appropriate HMM model parameters at this time? This is our second situation and the focus of our article. The most commonly used solution is the Baum-Welch algorithm (Baum-Welch), which is actually based on the EM algorithm. However, in the era of the Baum-Welch algorithm, the EM algorithm has not been abstracted, so we still Say Baum-Welch algorithm method. This also reminds us that abstracting a specific algorithm may also be a very important project.

Three, Baum-Welch algorithm principle

In step M, we maximize the above formula, and then get the updated model parameters as follows:

4. Derivation of Baum-Welch algorithm

We need to first calculate the expression of the joint distribution P (Q, I; λ) as follows:

The expected expression obtained in step E is:

In the M step, the above formula should be maximized:

  1. We look at the derivation of the model parameter Π. Since Π only appears in the first part of the parentheses in the above formula, our maximization formula for Π is:

To maximize the above expression, use the Lagrange multiplier method:

The summation can be obtained:

Substitute to get:

  1. To maximize L, use the Lagrange multiplier method to find the value of aij:

  1. The same reason: maximize L, use Lagrange multiplier method to solve the value of bij

  1. Summary: Maximizing the L function, the values ​​of π, a, and b can be obtained respectively.

Five, Baum-Welch algorithm summary

  Here we summarize the process of Baum-Welch algorithm.

Input: D observation sequence samples

Output: HMM model parameters

specific process:

1) Initialize all πi, aij, bij randomly

2) For each sample d = 1,2, ... D, use forward and backward algorithm to calculate γ, ξ

3) Update model parameters:

4) If the values ​​of πi, aij, and bij have converged, the algorithm ends, otherwise return to step 2) to continue the iteration.

Appendix 1: Handwriting exercises

Guess you like

Origin www.cnblogs.com/yifanrensheng/p/12684732.html