RNN Recurrent Neural Network knowledge

Recurrent neural network is valid and
recurrent neural network variety, we start with the simplest of basic recurrent neural networks begin.

Basic cyclic neural network
Below is a simple loop such as neural network, which consists of an input layer, a hidden layer and an output layer consisting of:
Here Insert Picture Description

Nani? ! I believe the first time I saw this stuff inside and readers like me collapse. Because recurrent neural network is too hard drawn up, on-line all the great God who had to abstract art with this technique. However, stop and take a closer look, it is actually well understood. If the above that there is a circle with an arrow W removed, it becomes the most common fully connected neural network . x is a vector that represents the value of the input layer (there is no painted circle that represents neuron node); S is a vector that represents the value of the hidden layer (here drew a hidden layer of nodes, you can imagine this in fact, one is a plurality of nodes, nodes with the same vector s dimension); U is the input layer to the hidden layer weight matrix (the reader can go back to the third article of zero-based entry-depth study (3) - neural network and anti propagation algorithm, we look at how the matrix calculation represented fully connected neural network); O is a vector that represents the output layer; V is the weight of the hidden layer to the output layer weight matrix. Well, now let's look at W is. Value s hidden layer of the neural network cycle depends not only on the current input x, s time also depends on the value of the hidden layer. Weight matrix W is one of the hidden layer as a value on the right to re-enter this time.

If we expand the above chart, recurrent neural network can also be painted look like this:
Here Insert Picture Description
bidirectional recurrent neural networks
for language models, often look in front of the word is not enough, for example, the following sentence:

My phone is broken, I'm going to ____ a new phone.

Imagine, if we look at the front of the horizontal word, the phone is broken, then I was going to repair a repair? Change a new one? Or a big cry match? These are impossible to determine. But if we also see a horizontal line behind the word is "a new phone," then, the word on the horizontal line fill probability of "buy" on much larger.

The basic cycle of neural networks in the previous section of this modeling is not, therefore, we need a bidirectional recurrent neural network, as shown below:
Here Insert Picture Description
vectorization
We know that the input and output of the neural network is a vector, in order to allow the language model can be neural network processing, we have to put the word in the form of expression vectors, such neural networks to process it.

Enter the neural network is the word, we can enter vectorization using the following steps:

Build a dictionary containing all the words, each word in the dictionary which has a unique number.
A word may be any one-hot vector with a N-dimensional to represent. Wherein, N is the number of words contained in the dictionary. Suppose a word in the dictionary is numbered i, v is a vector of the word, Vj is the j-th element of a vector, then:
Here Insert Picture Description
Use this method to quantify, we get a high-dimensional sparse vector ( sparse It refers to the value of most of the elements are zero ). Such vector processing will lead to our neural network has a lot of parameters, bring a large amount of calculation. Therefore, often you need to use some dimensionality reduction method, the high-dimensional sparse vector into a dense low-dimensional vector.

The language model required output is the next most likely word , we can make recurrent neural network calculation of each word in the dictionary is the probability of the next word, so that the maximum probability of the word is the next most likely word. Thus, the output vector of the neural network is a N-dimensional vector, each vector element corresponding to the corresponding word in the dictionary is a word probability. As shown below:
Here Insert Picture Description

Softmax layer

As mentioned earlier, the language model is a probability of occurrence of a word modeling. So, how to make probabilistic neural network output it? The method is to use the layer as a softmax output layer of the neural network.

Let's look at the definition of softmax function:
Here Insert Picture Description
this formula may seem faint, we give an example. Softmax layer as shown below:
Here Insert Picture Description
We can see from the figure, the input softmax layer is a vector, the output vector is one dimension, the two vectors are the same (which is 4 in this example). Input vector x = [1 2 3 4] after softmax layer, after softmax function calculation above, into the output vector y = [0.03 0.09 0.24 0.64] . The calculation is:
Here Insert Picture Description
Let's look at the output feature vector y:

Each value is a positive number between 0 and 1;
2 1 is the sum of all items.

We can easily find these features and the probability is the same, so we can put them as a probability. For the language model, we can consider the next model to predict the probability of a word is the first word in the dictionary is 0.03, the probability that the second word in the dictionary is 0.09, and so on.

Published 42 original articles · won praise 3 · Views 6132

Guess you like

Origin blog.csdn.net/weixin_41845265/article/details/104300583