LSTM (Long Short Term Memory Network) structural analysis and understanding

1. The LSTM structure diagram is as follows:

2. Block understanding:
The first step in our LSTM is to decide what information we will discard from the cell state. This decision is done through a layer called the forget gate . The gate reads h_{t-1} and x_t and outputs a value between 0 and 1 for each number in the cell state C_{t-1} . 1 means "completely reserved", 0 means "completely discarded".     
Let's go back to the example of a language model to predict the next word based on what we've seen. In this problem, the cell state may contain the gender of the current subject , so the correct pronoun can be selected. When we see a new subject , we want to forget the old one .

② The next step is to determine what kind of new information is stored in the cell state . There are two parts here.
First, a sigmoid layer called the "input gate layer" decides what values ​​we're going to update.
Then, a tanh layer creates a new vector of candidate values, {C}_t (with a tilde on C) , which is added to the state. Next, we'll use these two pieces of information to generate an update to the state.
In our language model example, we want to add the gender of the new subject to the cell state to replace the old subject to forget.

③ Now is the time to update the old cell state, C_{t-1} is updated to C_t . The previous steps have already decided what will be done, we are now going to actually do it.  
We multiply the old state by f_t , discarding the information we are sure needs to be discarded. Then add i_t * \tilde{C}_t . This is the new candidate value, which changes according to how much we decide to update each state.   
In the case of language models, this is where we actually drop the gender information of the old pronouns and add new ones, based on the goals we identified earlier.

Finally, we need to determine what value to output. This output will be based on our cell state, but also a filtered version. First, we run a sigmoid layer to determine which part of the cell's state will be output. Next, we process the cell state through tanh (to get a value between -1 and 1) and multiply it with the output of the sigmoid gate, and we end up outputting only the part of the output we're sure of.
In the case of the language model, since he saw a pronoun , he might need to output information related to a verb . For example, it might be possible to output whether the pronoun is singular or negative, so that if it is a verb, we also know what inflection the verb needs to make.   

Summary: The structure of LSTM in Li Feifei's courseware from Stanford University is described as follows:

Like the LSTM idea of ​​the above step-by-step analysis, the corresponding relationship is summarized as follows:


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326456994&siteId=291194637