[Dry goods] Why can recurrent neural network (RNN) memorize historical information

This note is followed by the previous note [Intuitive Understanding] The article Understanding the Basics of RNN (Circular Neural Network) (please poke me) Record why RNN can record previous historical information and how to reflect it in the formula?

Then first introduce why the ordinary neural network can not remember the previous historical information, and then lead to the corresponding ability of the RNN, because if the ordinary neural network can record the previous historical information, then there will be no RNN idea.

1

Ordinary Neural Network (MLP)

First of all, we have a task, which is to perform part-of-speech tagging. There are two training data below.
He confessed to me.
I think his confession is not sincere enough. The
correct part of speech is:
[Dry goods] Why can recurrent neural network (RNN) memorize historical information
then send these training data to the neural network for training, such as the first data "he/r", the neural network learns the one of "he->r" Mapping, the second data "to/p", the neural network learns a "to->p" mapping, so that the training data has been learned and updated to the last parameters, so as to learn the model, but the problem is coming.
The learning example diagram is as follows:

[Dry goods] Why can recurrent neural network (RNN) memorize historical information

In the above training data, the part of speech of some words is not unique. For example, the word "confession" is used as a verb v in the sentence "he confessed to me", and in the sentence "I think his confession is not sincere enough" As a noun n in the words, so for the neural network, it will be confused.

All of a sudden, the neural network has to learn that "confession" is a verb, and all of a sudden, "confession" is a noun. The neural network is also very innocent. It has no ability to deal with situations where "confession" should be judged as a noun. "Confession" is judged as a verb because the neural network cannot learn the surrounding context. The data fed to the neural network is not related to the previous data.

So we need a network that can remember previous historical information at this time. For example, in the first sentence, when we encounter an expression word, I know that the word in front of it is "I"/pronoun, then the confession after the pronoun The probability of being a verb is much greater than that of a noun. Of course, RNN can also see several words before him. In theory, rnn can memorize any word before the current word.

Similarly, in the second sentence, when the word "confession" is encountered, our network can know that the word in front of it is "的"/particle, so the probability of "confession" following the particle as a noun is much greater than The "confession" of the verb.

So we hope to have a network that can memorize previous knowledge to help the current task complete when predicting the current task, so that RNN will be on the stage. Some friends may say that it has many problems, such as the inability to long-term memory. But this article does not introduce, but in any case, RNN provides the possibility of solving this problem.

2

Recurrent neural network records historical information RNN

First, let’s introduce RNN.
First look at a simple recurrent neural network. It consists of an input layer, a hidden layer and an output layer:
[Dry goods] Why can recurrent neural network (RNN) memorize historical information

I don’t know if novice students can understand this picture. Anyway, when I first started learning, I was stunned. Does each node represent an input of a value, or is it a set of vector nodes in a layer, how to hide the layer? You can connect to yourself, wait for these doubts~ This picture is a relatively abstract picture.

Let's understand it in this way. If we remove the arrowed circle with W on it, it becomes the most common fully connected neural network. x is a vector, which represents the value of the input layer (the circle representing the neuron node is not drawn here); s is a vector, which represents the value of the hidden layer (here a node is drawn on the hidden layer, you can also imagine this One layer is actually multiple nodes, and the number of nodes is the same as the dimension of the vector s);

U is the weight matrix from the input layer to the hidden layer, o is also a vector, which represents the value of the output layer; V is the weight matrix from the hidden layer to the output layer.
So, now let's see what W is. The value s of the hidden layer of the cyclic neural network not only depends on the current input x, but also depends on the value s of the last hidden layer. The weight matrix W is the last value of the hidden layer as the weight of the input this time.

We give the concrete picture corresponding to this abstract picture:
[Dry goods] Why can recurrent neural network (RNN) memorize historical information

From the above figure, we can clearly see how the hidden layer at the previous moment affects the hidden layer at the current moment.
If we expand the above figure, the recurrent neural network can also be drawn like this:

[Dry goods] Why can recurrent neural network (RNN) memorize historical information
Now it seems clearer. After this network receives the input at time t, the value of the hidden layer is and the output value is. The key point is that the value of depends not only on but also on. We can use the following formula to express the calculation method of the recurrent neural network: the
formula is as follows:
[Dry goods] Why can recurrent neural network (RNN) memorize historical information

Then after we substitute the second formula into the first formula, there will be the following derivation:

[Dry goods] Why can recurrent neural network (RNN) memorize historical information

As can be seen from the above figure, the current moment does contain historical information, which also explains why the Recurrent Neural Network (RNN) can memorize historical information, and such features are indeed needed in many tasks.

Acknowledgements: Thanks to Xia Chong for the pictures~

Recommended reading:

Selected dry goods|Summary of dry goods catalog in the past six months
[optimization] basics of linear programming
[intuitive detailed explanation] what is PCA and SVD

      欢迎关注公众号学习交流~         

[Dry goods] Why can recurrent neural network (RNN) memorize historical information
Welcome to join the exchange group to exchange learning~
[Dry goods] Why can recurrent neural network (RNN) memorize historical information

Guess you like

Origin blog.51cto.com/15009309/2553998