Overview of Artificial Intelligence Algorithms (2) RNN and LSTM

 


Continued from the previous article: Overview of AI Algorithms (1)

RNN: Recurrent Neural Network and LSTM Long Short-Term Memory Network

LSTM is an RNN network, the external structure is the same, mainly because the internal structure of the unit is different. In other words, LSTM is an internal transformation that allows RNN to better handle NLP (natural language problems).

I recommend this article Understanding LSTMs: https://www.jianshu.com/p/9dc9f41f0b29

It may be better than what I said myself, I will just say it briefly here without going into too many technical details.

Let's first review the basic neural network structure diagram

The overall structure is the input layer + N hidden layer + output layer.

The flow of data is from left to right, the input X1, X2, X3 will be distributed to the hidden layer and passed through these connections, and then there is an output after the node calculation, which will continue to be distributed to the next layer.

Then, if you zoom in on a single node, this is the case.

Z is an intermediate node. This is the structure of a standard feedforward neural network.

This structure is very powerless when dealing with a certain type of problem, that is, for example, the current recognition result needs to rely on the previous recognition result.

The specific scene is the translation of natural language, and the translation needs to be combined with the context in order to translate it more accurately. It cannot be translated sentence by sentence like some machine translations.

 

 

Then the so-called recurrent neural network of RNN is a transformation in this intermediate node.

 

This transformation is to copy the output of the intermediate node, and then mix the next input to do another calculation (activation function) to get the result, and repeat until there is no input.

 

 So what is the difference between LSTM and this RNN?

Of course, RNN is not a panacea, and there are various pros and cons.

Then there is a flaw, that is, if the current T result depends on the result of the last time (that is, T -1), it is not a big problem. What if it depends on the result of T-2 or a little further T-10?

Then you look back at the structure diagram. If the result of h20 depends on X1, then the distance in the middle is far away. The original input has undergone many calculations until the loss of H20 is very large and the variable is larger.

And the need to rely on the context of the context is still common in natural language processing. so……

dang dang...

LSTM came into being, for world peace.

Let's first look at the difference between LSTM and regular RNN.

Mainly the change inside the green block, the external structure is the same.

 

This internal structure is very long like a circuit board, which can be divided into 3 parts.

They are "forget gate", "input gate" and "output gate"

How to realize the three doors of this structure is not detailed here. The address above has a detailed description. Children's shoes who love to read formulas can move.

Here I will briefly talk about why there are forget gates, input gates, and output gates.

Following what we just said in RNN, if the dependent results are far away, for example, T depends on the output results of T-10. There is a galaxy in the middle.

Do you want T? First forget all the information before T-11 T-12..., then enter T-10, then forget T-9 T-8 T-7... Then pass the input of T-10 through the output gate to get the result.

Then the parameters after the spread of the forget gate are from T-12 to T are:

T-11 T-10 T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1 T-0
0 1 0 0 0 0 0 0 0 0 0 0

The input gate is:

T-11 T-10 T-9 T-8 T-7 T-6 T-5 T-4 T-3 T-2 T-1 T-0
0 0 0 0 0 0 0 0 0 0 0 1

 

 

The training process of RNN is to adjust these parameters to conform to the laws of these data according to the identified data.

Well, that's it for RNN, thank you for reading!

Later I will add, GANs, and a review of transfer learning.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324440029&siteId=291194637