Understanding and Getting Started with LSTM Network Models

Recently, I have been working on a paper in the laboratory. The research direction is the prediction of time series. I have come into contact with the LSTM model, and I will record it here.
Below I translated the Understanding LSTM Networks blog post by colah to help me understand. It can be said that reading this article will give you a general understanding of the lstm model (slag English):

Recurrent Neural Networks
People don't think about problems from scratch every time. Like when you are reading this article, you will understand each word in the article based on your previous understanding of the text. You don't forget everything and start thinking again from scratch. It can be said that you think about the problem is persistent.
Traditional neural networks cannot do this, which seems to be one of their main shortcomings. For example, say you want to categorize the types of events that occur at each point in time in a movie. Traditional neural networks are currently unable to fully utilize the prediction results of previous event types to help future type predictions.
Recurrent Neural Networks solve this problem. They are looping networks, the purpose of which is to make information persistent.

write picture description here
In the figure above, the neural network block A gets some input xt and outputs a value ht. Recurrent networks allow information to pass from one neuron to the next in the network.
These loops make RNNs look a little mysterious. However, if you think about it, they are not very different from normal neural networks. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:
write picture description here
this chain-like nature reveals the close relationship between recurrent neural networks and sequences (vs lists). They are inherently neural networks that process this kind of data.
And they really work! Over the past few years, RNNs have been applied to a variety of problems with incredible success: speech recognition, language modeling, translation, image captioning… the list goes on and on.
The key to these successes is the use of "LSTMs," a very special type of recurrent neural network that is much better than the standard version for many tasks. Almost all exciting results based on recurrent neural networks are achieved through them. This post will explore these LSTMs.

One of the appeals of The Problem of Long-Term Dependencies
RNNs is that they may be able to connect information from previous tasks to the current task, e.g. using previous video frames can help deepen understanding of the current frame. If RNNs could do this, they would be very useful. But can they? it depends on.
Sometimes we just need to see recent information to help with the task at hand. For example, consider a language model that tries to predict the next word based on previous sentences. If we're trying to predict the last word "clouds in the sky ", we don't need any further context - obviously the next word is in the sky. In this case, if the gap between the relevant information and the desired context is small, the RNN can learn and use past information.
write picture description here
But there are also cases where we need more contextual information. Consider trying to predict the last word in "I grew up in France...I speak fluent French ". Recent information suggests that the next word may be the name of a language, but if we want to narrow down to which language, we need to go back even further, in the context of France for sure. Therefore, the gap between relevant information and context can be very large.
Unfortunately, as the gap widens, RNNs become unable to learn how to connect information.
write picture description here
In theory, RNNs are absolutely capable of handling such "long-term dependencies". Humans can carefully pick parameters to solve this form of problem. Sadly, in practice, RNNs don't seem to be able to solve these problems. Hochreiter and Bengio et al. delved into this issue, and they found some of the root causes.
Thankfully, LSTMs don't have this problem!

LSTM Networks
Long Short-Term Memory Networks - often referred to as "LSTMs" - are a special kind of RNN that are able to handle the long-term dependencies of learning. They were proposed by Hochreiter & Schmidhuber, and were refined and generalized by many in subsequent work. They handle a wide variety of problems brilliantly and are now widely used.
LSTMs are designed to avoid long-term dependencies. Remembering information for a long time is actually their default behavior, no need to train to learn!
All recurrent neural networks have a chain of neural network repeating modules. In a standard RNN, this repeating module has a very simple structure, such as a single tanh layer.
write picture description here
LSTMs also have this similar chained structure, but with a different structure within the repeating module. It has four separate neural network layers that interact in a very specific way.
write picture description here
Don't worry about the details. We will parse this graph step by step later. Now, let's try to get acquainted with the symbols we'll be using.
write picture description here
In the diagram above, the line segment contains a complete vector, from the output of one node to the input of the other nodes. Pink circles represent point-like operations such as adding vectors, while yellow boxes represent learning neural network layers. Arrows merging indicate concatenation, while arrows splitting indicate that their contents are being copied, and the copies will go to different locations.

The Core Idea Behind LSTMs
The key idea of ​​LSTMs is the cell state, the horizontal line that runs through the top of the graph.
The cell state is a bit like a conveyor belt. It runs through the chain with only some minor linear interactions. Information can easily flow through in an unchanging fashion.
write picture description here
LSTMs do have the ability to remove or add information about the cell state, which is fine-tuned through structures called gates.
A door is a way in which information can pass through. They consist of sigmoid neural network layers and dot product operations.

write picture description here
The sigmoid layer outputs a number between 0 and 1 describing how much each component should pass through. A value of 0 means "don't let anything through", while a value of 1 means "let everything through!" The
LSTM has three of these gates, which protect and control the cell state.

The first step in the Step-by-Step LSTM Walk Through
LSTM is to decide what information to throw away from the cell state. This decision is made through a sigmoid layer called the "forget gate layer". It looks at ht-1 and xt and outputs a number between 0 and 1 for each number in cell state Ct-1. 1 means "completely keep this", while 0 means "completely eliminate this".
Let's go back to our previous example and try to predict the next word based on all previous sentences. In such a problem scenario, the cell state may contain the gender of the current subject, so the correct referent can be used. When we see a new character, we want to forget the gender of the old character.
write picture description here
The next step is to decide what new information we want to store in the cell state. This has two parts. First, a sigmoid layer called the "input layer" will decide the values ​​we will update. Next, the tanh layer creates a new candidate vector Ct, which can also be added to the cell state. In the next step, we will combine the two to update the cell state.
In our language model example, we want to add the gender of the new person to the cell state to replace the old person we forgot.
write picture description here
It is time to update the old cell state Ct-1 to the new cell state Ct. The previous steps have already decided what to do, we just need to actually do it.
We multiply the old state by ft and forget what we decided to forget earlier. Then we add It Ct. This is the new candidate value, scaled according to the update scale we decided earlier. *
In terms of language models, this is where we actually drop the old character gender information and add new information, as we did before.
write picture description here
Finally, we need to decide what we want to output. This output will be based on our cell state and will be a filtered version. First, we run a sigmoid layer that decides which parts of the cell state we want to output. Then, we multiply the cell state by the output of the sigmoid layer by tanh (pushing the value between -1 and 1) so that we only output the part we decided.
Taking a language model as an example, since it has just seen a character, it may need to output verb-related information for later predictions. For example, it can output singular or plural so that we know what form of verbs (like look and looks) should follow.

The last
most basic LSTM model is roughly as above. It can be said that colah has already introduced it in great detail. Its blog also mentioned some variants of the LSTM model and related conclusions, which will not be translated here. Interested partners can click on the last original link.
This article is not original, it is only a personal translation, and it is intrusive and deleted.
Attach the original link:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324964810&siteId=291194637