A simple attempt at HMM model (hidden Markov model)

Preface

First of all, the preface. Well, I don’t seem to have much to say. I just recently learned the Hidden Markov Model, also known as HMM, in a speech recognition class, so I want to try it out and learn the basic theoretical knowledge carefully.

Theoretical part

Okay, first is the theoretical part, what is why and what should be done. These three questions are very important, so I will introduce them from these aspects next.

what is

Hidden Markov Model (HMM) is a probabilistic graphical model used to model time series data. It is a two-level probabilistic model, one level is the implicit, invisible (hidden) state, and the other level is the visible observation data. The main idea of ​​HMM is that although the state of the system cannot be directly observed (i.e., hidden), it can be indirectly inferred through the output of the system (i.e., observation data).

The following are the basic components and some key concepts of HMM:

### 1. **Basic components:**

- **Hidden States:** Represents the internal states of the system, which are invisible. In speech recognition, it can be phonemes, emotional states, etc.
  
- **Observations:** Represents the data that can be observed in each hidden state, that is, the output of the model. In speech recognition, it can be acoustic spectrum features, MFCC (Mel-Frequency Cepstral Coefficients), etc.

- **State Transition Probabilities (Transition Probabilities):** Indicates the probability of transitioning to another implicit state at the next moment under the condition of being in an implicit state at one moment.

- **Emission Probabilities:** Represents the probability of generating specific observation data in an implicit state.

- **Initial Probabilities:** Represents the probability that the system is in each implicit state at the beginning of the time series.

### 2. **Basic concepts:**

- **Markov Property:** The state transition and observation data generation in HMM satisfy the Markov Property, that is, the transition probability of a state only depends on the state of the previous moment and the state of other moments. Nothing to do. This is the so-called first-order Markov property.

- **Observational Independence Assumption:** It is assumed that the observation data at any time only depends on the implicit state at the current moment and has nothing to do with the state and observations at other times.

### 3. **Three classic problems of HMM:**

- **Evaluation Problem:** Given model parameters and observation sequence, calculate the probability of the observation sequence.

- **Decoding Problem:** Given model parameters and observation sequence, calculate the most likely hidden state sequence.

- **Learning Problem:** Given an observation sequence, estimate the model parameters to maximize the probability of the observation sequence.

### 4. **Application fields:**

HMM is widely used in natural language processing, speech recognition, handwriting recognition, bioinformatics (such as gene recognition), financial market analysis and other fields. It is used to model time series data and perform prediction, classification, segmentation and other tasks.

The core advantage of HMM is that it can handle incomplete, uncertain, and multi-modal time series data, making it widely used in practical applications.

Why

The HMM model is widely used because it has the following advantages and characteristics:

### 1. **Time series data modeling:**
   - HMM is suitable for modeling time series data and can capture the time series relationship and time evolution rules of the data to make it It is widely used in natural language processing, speech recognition, handwriting recognition and other fields.

### 2. **Handling incomplete and noisy data:**
   - HMM can handle incomplete data, that is, some parts of the observed data are missing or unobservable , this property makes it very useful in areas such as speech recognition.
   - It is also somewhat robust to noise and uncertainty in observational data, which makes it perform well when dealing with noisy data.

### 3. **Few model parameters:**
   - The HMM model has relatively few parameters, so it can be effectively constructed even when the amount of data is small. model, avoiding the curse of dimensionality problem.

### 4. **Probabilistic modeling:**
   - HMM is a probabilistic graphical model that can provide a natural and mathematical description of the probability of an event. . This probabilistic property makes HMM very useful in uncertainty modeling and probabilistic inference.

### 5. **Easy to reason and learn:**
   - HMM reasoning problems (including evaluation, decoding) can be solved through forward, backward algorithms and Viterbi Algorithms and other efficient solutions.
   - In terms of learning problems, methods such as the Baum-Welch algorithm can be used to estimate the parameters of the HMM, so that the model can adaptively learn from the data.

### 6. **Flexibility:**
   - The HMM model can be improved by increasing the number of hidden states or introducing more complex state transitions and emission probability distributions. Adapt to problems of different complexity.

Therefore, the HMM model has advantages in processing time series data, incomplete data, noisy data, and scenarios that require probabilistic modeling, making it widely used in many practical applications. Of course, in some specific scenarios, such as when processing data with strong long-term dependencies or when high-dimensional data needs to be processed, other more complex models may be chosen, such as Long Short-Term Memory Network (LSTM). The choice of model should be comprehensively considered based on the characteristics and needs of the specific problem.

what to do

Building and using an HMM model usually involves the following steps: initialization, training, evaluation (inference), and decoding. The following are the specific steps and calculation process:

### 1. Initialize model parameters:
   - **Number of hidden states (n_state):** First, determine the number of hidden states in the HMM model. This is usually determined based on the complexity of the problem and domain knowledge.
   - **Number of observation states (n_observation):** Then, determine the number of observation states, that is, the characteristic dimension of the observation data.

### 2. Training model parameters:
   - **Baum-Welch algorithm (a form of Expectation-Maximization algorithm):** Use the observation data set for training, By iteratively optimizing the parameters of the model, including initial probability, transition probability and emission probability, the model can best fit the observed data.

### 3. Evaluation (inference):
   - **Forward algorithm:** Used to calculate the probability of a given sequence of observation data, that is, the evaluation problem. Through the forward algorithm, the probability of observation data appearing under the model can be calculated.
   - **Backward algorithm:** Used to calculate the probability of observing a data sequence under a given model. Through the backward algorithm, it can be used for the expectation step (Expectation Step) in the Baum-Welch algorithm.
   - **Viterbi algorithm:** Used for decoding problems, that is, finding the most likely hidden state sequence given observation data.

### 4. Decoding:
   - **Viterbi algorithm:** Used to calculate the most likely hidden state sequence given the observation data. This algorithm can find the hidden state sequence that maximizes the probability given the observation data.

### 5. Application of the model:
   - The trained HMM model can be used for various tasks, such as speech recognition, handwriting recognition, and gene recognition in bioinformatics. wait. In these applications, HMM models can be used to model time series data and perform tasks such as probability inference, classification, and segmentation.

The process of calculating the HMM model mainly involves forward algorithm, backward algorithm and Viterbi algorithm. These algorithms are completed through matrix operations and recursive calculations. The specific implementation of HMM usually uses mathematical libraries (such as NumPy) to perform efficient matrix operations. It is recommended to use ready-made libraries or frameworks in practical applications, such as the hmmlearn library in Python, which provides a convenient and easy-to-use implementation of HMM models.

Okay, after simply reading these three questions, we can easily find that the most important parts are these:

1. Markov Property:

  • HMM is based on the Markov property, that is, the transition probability of the current state only depends on the previous state and has nothing to do with the earlier state. This means that in HMM, the future state of the system is only related to the current state and has nothing to do with the historical state.

2. Hidden States:

  • The system in HMM is assumed to have a set of hidden states that are not directly observable. Hidden state sequences describe state changes within the system, but they are invisible to observers.

3. Observation States:

  • Each hidden state can generate one or more observed states. Observation states are external observation data that can be observed, usually input data in actual tasks.

4. Initial Probabilities:

  • The initial probability represents the probability distribution of the system in each implicit state at the beginning of the time series data. It describes the possibility of the system being in each implicit state at t=1.

5. State transition probability (Transition Probabilities):

  • The state transition probability represents the probability distribution of the system transitioning from one state to another under a given implicit state. It describes the possibility of the system moving from a certain state at time t to another state at time t+1.

6. Emission Probabilities:

  • The emission probability represents the probability distribution of the system generating a specific observed state given the implicit state. It describes the probability of observing a specific observed state when the system is in an implicit state.

7. Forward Algorithm:

  • The forward algorithm is used to calculate the probability given observed data. It recursively calculates the probability that the system is in a specific implicit state under given observation data, and finally obtains the probability of the entire observation data sequence.

8. Backward Algorithm:

  • The backward algorithm is used to calculate the probability of observing a data sequence under a given model. It recursively calculates the probability that the system is in a specific implicit state under given observation data, and finally obtains the probability of the entire observation data sequence.

9. Viterbi Algorithm:

  • The Viterbi algorithm is used to find the most likely hidden state sequence given the observed data. It uses dynamic programming to find the hidden state sequence that maximizes the probability of observed data.

10. Baum-Welch algorithm:

  • The Baum-Welch algorithm is an EM algorithm that is used to iteratively optimize the parameters of the model, including initial probability, transition probability and emission probability, given the observation data, so that the model can best fit the observation data.

Practical part

Okay, after going through the complicated theoretical part, next is the practical part. In this part, I simply wrote a HMM model. This is the complete code:

'''前言,先建立一个简单的HMM模型'''

import numpy as np


class HMM():
    def __init__(self, n_state, n_observation):
        self.n_state = n_state  # 隐状态的个数
        self.n_observation = n_observation  # 观测状态的个数

        # 初始化模型参数
        self.initial_prob = np.ones(n_state) / n_state  # 初始概率向量
        self.transition_prob = np.ones((n_state, n_state)) / n_state  # 转移概率矩阵
        self.emission_prob = np.ones((n_state, n_observation)) / n_observation  # 发射概率矩阵

    def train(self, observations, iterations=100):
        # Baum-Welch算法,用于训练HMM模型参数

        for _ in range(iterations):
            # 初始化变量用于累计更新模型参数
            new_initial_prob = np.zeros(self.n_state)
            new_transition_prob = np.zeros((self.n_state, self.n_state))
            new_emission_prob = np.zeros((self.n_state, self.n_observation))

            for observation in observations:
                # 将观测状态映射到合法范围内
                observation = np.clip(observation, 0, self.n_observation - 1)

                # 前向算法
                alpha = np.zeros((len(observation), self.n_state))
                alpha[0] = self.initial_prob * self.emission_prob[:, observation[0]]
                for t in range(1, len(observation)):
                    alpha[t] = np.dot(alpha[t - 1], self.transition_prob) * self.emission_prob[:, observation[t]]

                # 后向算法
                beta = np.zeros((len(observation), self.n_state))
                beta[-1] = 1
                for t in range(len(observation) - 2, -1, -1):
                    beta[t] = np.dot(self.transition_prob, self.emission_prob[:, observation[t + 1]] * beta[t + 1])

                # 更新模型参数
                new_initial_prob += alpha[0]
                for t in range(len(observation) - 1):
                    new_transition_prob += (alpha[t][:, np.newaxis] * self.transition_prob *
                                            self.emission_prob[:, observation[t + 1]] * beta[t + 1])
                for t in range(len(observation)):
                    new_emission_prob[:, observation[t]] += alpha[t] * beta[t]

            # 归一化模型参数
            self.initial_prob = new_initial_prob / np.sum(new_initial_prob)
            self.transition_prob = new_transition_prob / np.sum(new_transition_prob, axis=1)[:, np.newaxis]
            self.emission_prob = new_emission_prob / np.sum(new_emission_prob, axis=1)[:, np.newaxis]

    def predict(self, observation):
        # 维特比算法,用于预测给定观测序列下的最可能的隐状态序列

        # 初始化变量
        T = len(observation)
        delta = np.zeros((T, self.n_state))
        psi = np.zeros((T, self.n_state), dtype=int)

        # 初始化初始状态
        delta[0] = self.initial_prob * self.emission_prob[:, observation[0]]

        # 递推计算最大概率路径
        for t in range(1, T):
            for j in range(self.n_state):
                delta[t, j] = np.max(delta[t - 1] * self.transition_prob[:, j] * self.emission_prob[j, observation[t]])
                psi[t, j] = np.argmax(delta[t - 1] * self.transition_prob[:, j])

        # 回溯得到最可能的隐状态序列
        states = [np.argmax(delta[-1])]
        for t in range(T - 1, 0, -1):
            states.append(psi[t, states[-1]])

        return list(reversed(states))

Next I want to explain this code

This code implements a simple Hidden Markov Model (HMM), including initializing the model parameters, the Baum-Welch algorithm for training the model parameters, and the Viterbi algorithm for predicting the most likely hidden Markov model under a given observation sequence. status sequence.

1. **Initialization (__init__):**
   - `n_state` represents the number of hidden states, `n_observation` represents the number of observation states.
   - `initial_prob` is the initial probability vector, `transition_prob` is the transition probability matrix, `emission_prob` is the emission probability matrix, they are all initialized to a uniform distribution.

2. **Train:**
   - `observations` is a list of observation sequences used for training.
   - In the `train` method, the E step is first performed, which includes the forward algorithm and the backward algorithm. The forward algorithm calculates the forward probability on each hidden state at each time step, and the backward algorithm calculates the backward probability on each hidden state at each time step.
   - Then proceed to the M step to update the model's initial probability, state transition probability and emission probability through forward and backward probabilities. This process is carried out iteratively so that the model parameters gradually converge.

3. **Predict:**
   - `observation` is the given observation sequence.
   - In the `predict` method, the Viterbi algorithm is used to calculate the most likely hidden state sequence given the observation sequence. The Viterbi algorithm calculates the maximum probability of each hidden state at each time step and the path to reach this probability through dynamic programming.

Conclusion

In short, that’s it, forget the front, forget the middle, forget the last ()

Guess you like

Origin blog.csdn.net/m0_73872315/article/details/134107126