LSTM model

Table of contents

LSTM model

LSTM structure diagram

The core idea of ​​LSTM

cell state

forgotten door

input gate

output gate

RNN model

 LRNN


LSTM model


What is LSTM model?
LSTM (Long Short-Term Memory), also known as long-short-term memory structure, is a variant of traditional RNN. Compared with classic RNN, it can effectively capture the semantic association between long sequences and alleviate the phenomenon of gradient disappearance or explosion. At the same time, the structure of LSTM is more complex, and its core structure can be divided into four parts to analyze:
●Forgetting gate
●Input gate
●Cell state
●Output gate

LSTMs also have this chain structure, but its repeating unit is different from the unit in the standard RNN network with only one network layer, and it has four network layers inside. The structure of LSTMs is shown in the figure below.

LSTM structure diagram

The reason why LSTM can solve the long-term dependency problem of RNN is that LSTM introduces a gate mechanism to control the circulation and loss of features. It is possible to extract features at time ti at time tn.

When explaining the detailed structure of LSTMs, first define the meaning of each symbol in the figure. The symbols include the following:

Each yellow box represents a neural network layer, consisting of weights, biases, and activation functions; each pink circle represents element-level operations; arrows represent vector flow; intersecting arrows represent vector splicing; bifurcated arrows represent vectors copy.

The core idea of ​​LSTM

Compared with the hidden state of the original RNN, LSTM adds a cell state. I will mark the input and output of a moment t in the middle of lstm:

We can first cover up the lump in the middle and look at the input and output of LSTM at time t. First, there are three inputs: cell state Ct-1, hidden layer state ht-1, input vector Xt at time t, and the output is Two: cell state Ct, hidden layer state ht, where ht is also used as the output at time t.

cell state

At the heart of LSTMs is the cell state , represented by horizontal lines running through the cell. Cell states are like conveyor belts. It runs through the entire cell but has only a few branches, which ensures that information flows through the entire RNNs unchanged. The state of the cell is shown in the figure below:

The LSTM network can delete or add information to the cell state through a structure called a gate . Gates can selectively decide which information to pass through. In fact, the structure of the gate is very simple, which is a combination of a sigmoid layer and a dot product operation. As shown below:

Because the output of the sigmoid layer is a value of 0-1, this represents how much information can flow through the sigmoid layer. 0 means none can pass, 1 means all can pass.

As mentioned earlier, LSTM controls the cell state by three gates, which are called forget gate, input gate and output gate. Let's talk about it one by one.

forgotten door

The first step in LSTM is to decide what information needs to be discarded from the cell state. This part of the operation is handled by a sigmoid unit called a forget gate. It outputs a vector between 0-1 by looking at the Xt and ht-1 information, and the 0-1 value in the vector indicates which information in the cell state Ct-1 is retained or how much is discarded. 0 means not reserved, 1 means reserved. Forget about the door as shown in the image below.

Let me first talk about [ ht − 1 , xt ] This thing means connecting two vectors (the operation is the same as numpy.concatenate)

input gate

The next step is to decide what new information to add to the cell state. This step is divided into two steps,

First, use ht-1 and Xt to decide which information to update through an operation called an input gate. Then use ht-1 and Xt to get new candidate cell information Ct~ through a tanh layer, which may be updated into the cell information. These two steps are described in the figure below.

Next, the old cell information Ct-1 will be updated to the new cell information Ct. The update rule is to select and forget a part of the old cell information through the forget gate, and select and add a part of the candidate cell information Ct~ through the input gate to obtain the new cell information Ct. The update operation is shown in the figure below

output gate

After updating the cell state, it is necessary to judge the state characteristics of the output cell according to the input ht-1 and Xt. Here, it is necessary to pass the input through a sigmoid layer called the output gate to obtain the judgment condition, and then pass the cell state through the tanh layer to obtain a - A vector of values ​​between 1 and 1, which is multiplied by the judgment condition obtained by the output gate to obtain the final output of the RNN unit. The steps are shown in the figure below:

RNN model

 LRNN

 

 

 

 

 

[LSTM Long Short-Term Memory Network] The 3D model is clear at a glance, showing you the logic behind the algorithm_哔哩哔哩_bilibili 

 

Guess you like

Origin blog.csdn.net/qq_38998213/article/details/132369988