Introduction to RNN and LSTM (BI-LSTIM)

Reference: https://zhuanlan.zhihu.com/p/518848475
1. RNN
    RNN is simply a circular convolutional network.
insert image description here
    In short, when calculating the next information, the information of the previous calculation will be added to this calculation, which is the meaning of the cycle. This convolution method can make full use of the upper layer of information, so that the RNN network is very suitable for the language processing environment (language also depends on context semantics).
    The figure below also shows the idea of ​​the entire circular convolution.
insert image description here

2. LSTM
(1) Model structure
    RNN As the amount of information increases, there will be problems such as gradient disappearance, gradient explosion, and poor long-distance dependence. Therefore, LSTM was proposed, and Long short-term memory (LSTM) is also a kind of RNN network. In terms of network structure, LSTM has added many modules to the network.
insert image description hereinsert image description here
    There are two paths in LSTM, namely C (cell layer) and H (hidden layer) .
    The entire network has three inputs: the current input X(t) , the hidden layer h(t-1) of the previous layer, and the cell layer C(t-1) of the previous layer . The entire network has two outputs, the hidden layer h(t) and the cell layer C(t) calculated by the current network .
    In layman's terms, the C layer is a control parameter, which determines what kind of information will be kept and what kind of information will be forgotten. In LSTM, layer C and layer H interact through "gates". It can be seen that h(t) is not only determined by the previous state and this input, but also has a cell state C(t-1), which is the biggest difference from RNN.

(2) Gate structure

insert image description here
insert image description here
    The output of the forget gate σ is between 0 and 1. You can see that ft and c(t-1) are multiplied bit by bit, so if the position corresponding to ft is 0, then the corresponding position of the cell layer The information will also be clear, so as to achieve the purpose of forgetting.

②Update gate
insert image description here
    The update gate will input the two parameters it and Ct, multiply them and then add them to the updated Ct-1 that comes in and out.
    The information it is equal to the σ in the forget gate, that is, which information needs to be retained, and tanh normalizes the content to -1,1.
insert image description here
    Here is to collage the information of the cell layer after passing through with the new information obtained to form the updated cell layer information.

③The formula of the output gate
insert image description here
    is different in pytorch, but the idea is similar. Putting the same network together will give you a general network structure. In short, LSTM remembers what needs to be remembered for a long time, and forgets unimportant information.
insert image description here
3. BI-LSTM
    BI-LSTM, two-way long and short-term memory. Introduce the last cell state into the input gate, forget gate and calculation of new information at the same time. For the task of sequence modeling, the future information and historical information at each moment are equally important, and the standard LSTM model cannot capture future information in its order. The bidirectional LSTM model adds a reverse LSTM layer to the original forward LSTM network layer. That is to say, BI-LSTM can calculate both forward and backward.
insert image description here
Referring to the illustration in https://blog.csdn.net/m0_59749089/article/details/128754246, it is actually two separate LSTMs, and then the results are collaged and post-processed, which is an optimized LSTM. The parameters of the two LSTMs are not shared, and the knowledge shares a word vector list.
insert image description here

Guess you like

Origin blog.csdn.net/daweq/article/details/129597565