LSTM (Long Short Term Memory Network)

Table of contents

1. What is LSTM?

2. Detailed explanation of LSTM

 0. What is a door?

1. Forgotten Gate

2. Input gate

3. Update memory

4. Output gate


1. What is LSTM?

LSTM is a kind of RNN, which can solve the short-term memory shortage of RNN. When a sequence is long enough, it will be difficult for RNN to transfer information from earlier time steps to later time steps, while LSTM can learn long-term dependent information. , remembering information from earlier time steps, so context can be linked.

for example:

1.

RNN can effectively predict the blank word as sky based on the previous words.

2.

RNN can conclude from the information near the space that a language should be filled here, but we don’t know which language it is. What we want is to let the model figure out that this should be French based on the previous I grew up in France. But the sequence is too far apart, and RNN cannot effectively use historical information, so we need to rely on LSTM to do it.

2. Detailed explanation of LSTM

LSTM has three gates, which are forget gate, input gate (some are also called update gate, candidate gate, the function is the same), and output gate. These three gates are used to control which information should be forgotten and discarded, and which information should be kept or kept unchanged. Just like a pipe valve, it controls the flow and non-flow.

Let's first look at an expanded diagram of the lstm structure, and then disassemble the three doors and their functions.

Symbol Description:

x_{t}: Enter information at the current moment 

h_{t-1}: The hidden state at the last moment

h_{t}: the hidden state passed to the next moment

 \sigma:sigmoid function, through which the data can be changed to a value in the range of 0-1.

tanh: tanh function, through which the data can be changed to a value in the range [-1,1]

Features of LSTMs:

lstm has one more cell state than rnn, and some people call it memory cells.

The horizontal black line in the figure below indicates the cell state at the previous moment. C_{t-1}Through the control of three gates and linear operations, it is determined which information should be forgotten and discarded, which information should be retained, or unchanged, so as to generate the cell state at the current moment, and then C_{t}output At the next moment, use this to update the memory.

This is just the updating process of the cell state at a certain moment. In the whole long sequence, the cell state at each moment selectively forgets and discards the information at the previous moment, or keeps it, or keeps it unchanged, and updates the memory. From C_{1}teleportation to C_{t}, so long-term memory can be achieved.

 0. What is a door?

LSTM controls cell state forgetting or adding memory through a gate. The structure of the gate is a sigmoid layer and then multiplied by the cell state at a moment. The figure below is a gate.

The value range output by the sigmoid layer is [0, 1], which can be used as a gating signal. If 0 is multiplied by any value, it is still 0, which means that it cannot pass, and these memories will be forgotten. 1 means that it can pass, and the memory will be retained.

1. Forgotten Gate

The object of the forgetting gate is the cell state C_{t-1}, and its function is to control the information in the cell state for selective forgetting, and decide which part needs to be discarded and which part needs to be retained. So how to control the selective forgetting memory of the cell state, let's look at it in combination with the formula:

The formula for the forget gate: 

 

 Multiply the weight matrix W_{f}and the concatenated matrix, add the bias , and put h_{t-1}it into the sigmoid function to get a matrix with the same dimension , for example, get [0,0,1,0], in the matrix Each value determines whether the corresponding information is forgotten or retained, 0 means completely discarded, and 1 means completely retained.x_{t}b_{f}C_{t-1}f_{t}f_{t}C_{t-1}

For example, a language model that predicts the next word based on all previous words. The cell state may have remembered the gender of the current character in order to predict the personal pronoun (he or she) next time, but when a new character appears, the gender of the previous character needs to be forgotten.

Example:

Xiaoming is a handsome boy, and Xiaomei likes... When dealing with "Xiaomei", you need to selectively forget the previous "Xiaoming", or reduce the effect of this word on the following words.

2. Input gate

The object of the input gate is also the cell state, and its function is to decide which new information to store in the cell state, that is, which new memories the cell state selectively adds. So how to decide which new information to add? The input gate is divided into two parts.

 first part:

 Build an input gate that determines which information is added to the cell state as a new memory.

Enter the gate formula: 

Similar to the forget gate formula, the input gate obtains a \tilde{C}_{t}matrix with the same dimension through linear operations i_{t}, for example, it i_{t}is [0,0,1,0], each value in the matrix determines \tilde{C}_{t}whether to discard or retain the corresponding information, and 0 represents Completely discard, 1 means completely reserved

 the second part:

Construct a candidate cell state \tilde{C}_{t}, which saves  the information in x_{t}and h_{t-1}, and then multiplies it with the value i_{t}of ​​​​​​​​​​, to determine which memories are useful. Still i_{t}, 0 means completely discarded, and 1 means completely retained, so The retained information is added to the new cell state as a new memory, so it \tilde{C}_{t}is called a candidate cell state here, it is just waiting for selection, and useful ones will be added as new memory.

 

The formula for the candidate cell state  \tilde{C}_{t}:

 

Following the previous example:

This step introduces some information, such as Xiaomei likes to wear skirts, and the input gate is to filter out useful information from these information to remember the gender of the new character, remember Xiaomei’s gender is female, and add it as a new memory .

3. Update memory

Memory in new cell states consists of two parts.

Partly, the old memories left after the memory cells of the previous moment have forgotten the useless memories

The other part is that useful information is screened out in the input gate as a new memory

The formula for the new cell state:

 f_{t}The dot product C_{t-1}represents the old memory after processing, and i_{t}the dot product \tilde{C}_{t}represents the new memory that needs to be added, which adds up to the new cell state.

Continue with the example:

Xiaoming is a handsome boy, and Xiaomei likes to wear skirts. This step is to forget that Xiaoming's gender is male, and remember that Xiaomei's gender is female.

4. Output gate

The role of the output gate is to determine what the final output is, that is h_{t}, this output is C_{t}based on the new cell state and is performed in two parts.

first part:

Still use the sigmoid layer to obtain o_{t}, such as o_{t}​​​​​​= [0,0,0,1], to determine which part of the cell state needs to be output, the required memory is output, not all are output.

 o_{t}The formula for:

 the second part:

Process the new cell state through the tanh function, change the output value to [-1. 1], and then point multiplication o_{t}to control which part needs to be output.

 h_{t}The formula for:

 

Guess you like

Origin blog.csdn.net/Michale_L/article/details/122782164