Bidirectional LSTM (Bidirectional LSTM) and ordinary LSTM formula process

LSTM is a variant of Recurrent Neural Network (RNN) that memorizes sequence information by introducing "gates". These gates include input gates, forget gates, and output gates.

The main difference between Bidirectional LSTM and ordinary LSTM is that it processes the front and rear information of the input sequence at the same time. For a word, the bidirectional LSTM model not only predicts the next word based on the current word and the previous word, but also predicts the next word based on the current word and the next word. Therefore, it can capture more complex language patterns.

The following is the formula process of bidirectional LSTM:

initialization:

Input vector: x_t
Hidden state: h_t and c_t
Output: y_t

Forward propagation:

Calculate the current input gate, forget gate and output gate based on the current input and the previous hidden state:
Input gate: i_t = sigmoid(W_xi * x_t + W_hi * h_{t-1} + b_i)
Forget gate: f_t = sigmoid(W_xf * x_t + W_hf * h_{t-1} + b_f)
Output gate: o_t = sigmoid(W_xo * x_t + W_ho * h_{t-1} + b_o)
Calculate new candidate memory cell state based on input gate and current input: candidate c_t = tanh(W_xc * x_t + W_hc * h_{t-1} + b_c)
updates the current memory cell state: c_t = f_t * c_{t-1} + i_t * candidate c_t
updates the current hidden state: h_t = o_t * tanh (c_t)

Backpropagation:

Similar to forward propagation, but the reverse time order needs to be considered. For example, at time t, we need to use the hidden state h_{T-1} at time T-1.

Predict the next word:

For each time step, concatenate the forward and reverse hidden states together: combined_h_t = [h_t; rev(h_{T-1})] Use the softmax function to calculate the probability distribution of
each word: y_t = softmax(W * combined_h_t + b)

The above is the formula process of bidirectional LSTM.

Compared with ordinary LSTM, the advantage of bidirectional LSTM is that it can capture the previous and next information in the input sequence. Therefore, for some tasks that require understanding the context, such as translation, summary, sentiment analysis, etc., bidirectional LSTM usually performs better than ordinary LSTM.

The disadvantage of bidirectional LSTM is that it requires more computing resources and memory because it needs to process the entire sequence at the same time, not just the previous word. In addition, because it considers both front and rear information, it relies more closely on the input sequence, and small changes to the input sequence may affect the prediction results.

Despite these shortcomings, bidirectional LSTM has achieved remarkable success in many natural language processing tasks and has become the basic architecture of many models.

Guess you like

Origin blog.csdn.net/xifenglie123321/article/details/132666978