A simple understanding of what LSTM neural network is

Hello everyone, my name is Dong Dongcan.

When I first started doing program development, before the company submitted the code, it needed to be reviewed by the big guys. After the big guys read it, they would always type a string of "LGTM" in the comment area.

As a novice at the time, I naively thought that the boss thought the code I submitted was pretty good, and was jokingly praising me by saying, "Brother is so fierce."

Later I learned that this is a slang term for review, which means Look Good To Me (LGTM), which means "I think it's okay."

Later, when I learned algorithms, I saw LSTM and wondered what it meant.

1. What is LSTM

LSTM (Long Short Term Memory), long short-term memory, is a special recurrent neural network. Yesterday's article introduced  Seq2Seq and said that LSTM is actually a Seq2Seq structure.

This network is mainly used to process data tasks with time series, such as text translation, text-to-speech, etc.

There are actually a lot of articles about LSTM on the Internet, but I felt very uncomfortable when I was studying it, because many articles basically just threw away formulas and talked about three doors and it was done.

After reading it as a novice, I still can't learn anything, and it is difficult to have a perceptual understanding of this algorithm.

Later, I checked a lot of information, and I can’t remember which article or video I saw a detailed explanation of this algorithm. I was very impressed, so I will share it with you today.​ 

Please follow the ideas below. The article is not long. After reading it, I hope you will have a new understanding of LSTM.

2. Understand LSTM based on an exam example

First, let's assume a scenario. We are college students, currently in the final exam stage, and have completed linear algebra. Next, we have a high-level mathematics exam to take.

As students, it is natural for us to start reviewing (learning) advanced mathematics content.

In this scenario, LSTM is used to handle this kind of task with time series, that is, after passing the linear algebra exam, go on to learn advanced mathematics.

Let’s take a look at how LSTM can learn advanced mathematics content just like humans. Although I don’t plan to go into too many technical details, some concepts in LSTM still need to be explained with examples.

First of all, the structure of LSTM is roughly as follows.

picture

We only look at a box in the middle of. It receives the two outputs of the previous box (one is the real output of the previous layer) The output state corresponds to the black arrow above; one is the hidden state output by the previous layer, corresponding to the output arrow below), and a new Xt is accepted as input.

Okay, let's start here.

We are now going to take the advanced mathematics exam and study advanced mathematics knowledge.

We definitely want to remember all the content related to advanced mathematics and forget everything that has nothing to do with advanced mathematics. It is best that when taking the advanced mathematics exam, the brain is full of advanced mathematics knowledge and all other physical and chemical knowledge is forgotten.

Let’s analyze it from the leftmost side of the large middle box.

First of all, at this time, we accepted the output of the previous unit moment. We took the linear algebra test at the previous moment, and the output state is the state where we just finished the linear algebra test.

So what do we most want to do at this time? Of course, forget everything you learned before that has nothing to do with this high-level math test (selective forgetting).

Why is it called selective forgetting?

The last test we took was linear algebra, so this time we will take the advanced mathematics test. In fact, there are still a lot of related knowledge between linear algebra and advanced mathematics, so at this time we definitely want to keep the relevant parts and forget the irrelevant ones. .

If the last test we took was English, then there is a high probability that all the knowledge will be irrelevant and we can almost forget it.

Speaking of this, how to selectively forget the output state of the previous box unit? Here we encounter the first gate in the LSTM structure - Forgetting gate.

oblivion door

picture

We can see that the first forget gate is completed by an activation function and a multiplication.

It accepts the information of this state (xt), which is the advanced mathematics knowledge we are reviewing, and also accepts the hidden state of the previous box unit (ht-1, our brain state after the last exam), Then it is multiplied with the output of the previous unit (Ct-1) through the activation function.

Let’s explain this process vividly: we have learned the content of high mathematics (xt), and a part of the content of the previous linear algebra is still retained in the brain, that is, the hidden state (ht-1). These two states are activated by the activation function. , selective retention, whoever has greater weight will retain more information in the end.

Therefore, in this step, if we review high numbers diligently and do not review high numbers diligently, the corresponding weights of

After going through the activation function, we think that what is retained is more information related to high numbers.

Then this information is multiplied by the output state when the last exam was completed (the information obtained is the information related to the high number (this information will continue to be transmitted later), and the rest of the information unrelated to the high number is almost Zero is forgotten.

At this point, we have forgotten everything we should have forgotten before, but to take the advanced mathematics exam, forgetting (clearing the brain of useless information) is not enough. What is more important is to take the advanced mathematics knowledge we have learned (xt) Remember.

Then we need to input the newly learned advanced mathematical knowledge into the brain, that is, LSTM needs to learn advanced mathematical knowledge, and then comes the second gate - Input gate a>.

importgate

It is also easy to understand from the name. It inputs the knowledge that this layer wants to learn, so it is called the input gate.

picture

Looking at the picture above, after the high-mathematics knowledge (xt) learned this time is combined with the state of the last hidden layer, it passes through an activation, and then passes through a tanh, and then the two are multiplied.

The difference between this activation and the activation of the forget gate is that the activation output of the forget gate acts on the output of the previous layer, while the activation of the input gate acts on the output of tanh .

Popular understanding, this will select the high-level math content we are learning this time (because not all high-level math content will be tested), and the multiplication of the two plays an information filtering role, and the output of the multiplication is the purified high-level math content. Mathematics knowledge (these advanced mathematics knowledge will most likely be tested).

Then add it to the information filtered by the forgetting gate above, and you will get a new knowledge base for the high school mathematics exam (here, there are the knowledge related to the high school mathematics exam left over from the previous layer (after the linear algebra test) , such as general operation knowledge such as addition, subtraction, multiplication and division, as well as knowledge that has been refined after reviewing advanced mathematics, such as calculus, which can be said to be a must-test question).

At this point, basically we can take the exam. Here is the output gate.

output gate

picture

After the information of the input gate and forget gate is added (Ct), it is directly output to the next layer.

picture

The output gate also has a branch, xt is multiplied by the output of tanh after activation, and then passed to the next layer as a hidden state.

So what is this doing?

Remember what our purpose is? take an exam.

This is understood to mean that I am taking the high-level math test. With the knowledge I have purified before and the knowledge I learned this time, I did a few high-level math questions and then passed the high-level math test (another information filtering, I only remembered that after the test The few questions in the post-high school math test) are passed to the next layer as hidden states.

It is possible that the lower level will have to take the mathematical statistics test again, and mathematical statistics may need to use the advanced mathematics knowledge of this level and the knowledge of linear algebra of the previous level, and another cycle will continue until all exams are over.

Here we use an exam example to briefly describe the functions of forgetting gates, input gates and output gates, as well as how LSTM achieves selective forgetting and information filtering. We hope it can help you learn LSTM.

As for why in LSTM, the forgetting gate can forget the information we don’t want, the input gate can purify the information, and the output gate can perform the best performance when taking the exam?

It can be considered as a matter of training the LSTM network. When training LSTM, the final network convergence will obtain a series of weights, which are used to help the forgetting gate forget better, the input gate input better, and the output gate output better.

Finally, some technical details such as why to choose the sigmoid activation function, if you are interested, you can search it, I won’t go into details here.

I hope that after reading this article, you will have a perceptual understanding of the LSTM algorithm.

Some of the expressions in the article come from explanations in articles or videos that I have read before. The source cannot be found. If any students know the source, they can leave a message in the comment area and sign the author's name.

Guess you like

Origin blog.csdn.net/dongtuoc/article/details/135005089