Notes on the Principles of Recurrent Neural Networks
(1) Mathematical principles
The cyclic neural network is a kind of recursive neural network (recursive neural network) that takes sequence (sequence) data as input, performs recursion in the evolution direction of the sequence, and all nodes (cyclic units) are connected in a chain.
As shown in the figure below, the input of each layer is the words X1, X2, X3... (words are processed, such as one-hot encoding) after word segmentation, and an a0 (which can be a 0 vector or a random vector) is initialized. After a series of operations with the weights Waa and Wax of this layer, a1 is obtained, and then a1 and Wya are calculated to obtain y^. Each y^ generated under this structure is only related to x before the layer . Each layer shares weights, and Waa is a time step. See below for the calculation formula.
Note: Bidirectional recurrent neural network can make the front and back are correlated.
Calculation formula:
Schematic diagram of RNN forward propagation:
The backpropagation of RNN is the reverse propagation through time, and its calculation direction is in reverse time order.
Schematic diagram of RNN backpropagation:
(2) Basic RNN structure
One-to-one: Basic unit
One-to-many: Music generation/sequence generation. It is the output of the previous layer as the input of the current layer.
Many-to-One: Sentiment Classification. Associate outputs with all inputs.
Many-to-many: 1. The input and output lengths are the same: named entity recognition. The output is relative to the content up to that point. 2. The input and output lengths are different: machine translation. The input part can be regarded as an encoder, and the output part can be regarded as a decoder.
(3) Gradient disappears
When the neural network backpropagates, when the number of network layers is large, it is difficult for the y obtained by the later layer to be propagated back, and it is difficult to affect the weight of the front layer. In RNN, because the gradient disappears, it is difficult for the output error of the back layer to affect the calculation of the front layer, which makes RNN not good at capturing long-term dependencies (for example: the relationship between the subject of a paragraph and the verb form at the back of the sentence, because the interval between the verb and the subject can be very long, it is difficult to propagate the influence of the latter verb to the layer where the previous subject is located during training to form a relationship). The parameters of a layer in the basic RNN model can only be affected by nearby layers. RNN cannot handle long-term dependencies well.
In order to solve the problem of gradient disappearance, GRU unit (gated recurrent unit) and LSTM unit (long short-term memory unit) are introduced.
(4) GRU and LSTM unit
1. GRU (Gated Recurrent Unit)
The GRU has two gates: the update gate u and the correlation gate r to tell the network when to update.
The update gate u is a value between 0 and 1, which tells the network when to update the value of the memory cell.
The correlation gate r tells the correlation between c and c.
Use mathematical expressions to express as follows:
Intuitively, look as follows:
2. LSTM (long short-term memory)
LSTM has three gates: update gate u, forget gate f, and output gate o.
Use mathematical expressions to express as follows:
intuitively express as follows:
(5) BRNN and Deep RNNs
1. BRNN (Bidirectional Recurrent Neural Network)
Bidirectional recurrent neural networks solve the problem that basic neural networks can only access "past" information. It is a network composed of different units (basic RNN unit, GRU unit, LSTM unit) that can reflect "past" and "future" information.
The y value is not only related to the "past", but also affected by the "future".
2. Deep RNNs (deep recurrent neural network)
A deep recurrent neural network is a time-expanded network after stacking multiple RNNs. Each of its layers shares weights .
epilogue
The learning content is the fifth lesson of Wu Enda's in-depth study: sequence model,
some screenshots are from Huang Haiguang's in-depth study notes, and
the content is just a backup of his daily study notes.