Notes on the Principles of Recurrent Neural Networks

(1) Mathematical principles

The cyclic neural network is a kind of recursive neural network (recursive neural network) that takes sequence (sequence) data as input, performs recursion in the evolution direction of the sequence, and all nodes (cyclic units) are connected in a chain.
As shown in the figure below, the input of each layer is the words X1, X2, X3... (words are processed, such as one-hot encoding) after word segmentation, and an a0 (which can be a 0 vector or a random vector) is initialized. After a series of operations with the weights Waa and Wax of this layer, a1 is obtained, and then a1 and Wya are calculated to obtain y^. Each y^ generated under this structure is only related to x before the layer . Each layer shares weights, and Waa is a time step. See below for the calculation formula.
Note: Bidirectional recurrent neural network can make the front and back are correlated.
insert image description here

insert image description here
Calculation formula:
insert image description here
Schematic diagram of RNN forward propagation:
insert image description here
The backpropagation of RNN is the reverse propagation through time, and its calculation direction is in reverse time order.
insert image description here
Schematic diagram of RNN backpropagation:
insert image description here

(2) Basic RNN structure

One-to-one: Basic unit
One-to-many: Music generation/sequence generation. It is the output of the previous layer as the input of the current layer.
Many-to-One: Sentiment Classification. Associate outputs with all inputs.
Many-to-many: 1. The input and output lengths are the same: named entity recognition. The output is relative to the content up to that point. 2. The input and output lengths are different: machine translation. The input part can be regarded as an encoder, and the output part can be regarded as a decoder.
RNN basic structure

(3) Gradient disappears

When the neural network backpropagates, when the number of network layers is large, it is difficult for the y obtained by the later layer to be propagated back, and it is difficult to affect the weight of the front layer. In RNN, because the gradient disappears, it is difficult for the output error of the back layer to affect the calculation of the front layer, which makes RNN not good at capturing long-term dependencies (for example: the relationship between the subject of a paragraph and the verb form at the back of the sentence, because the interval between the verb and the subject can be very long, it is difficult to propagate the influence of the latter verb to the layer where the previous subject is located during training to form a relationship). The parameters of a layer in the basic RNN model can only be affected by nearby layers. RNN cannot handle long-term dependencies well.
In order to solve the problem of gradient disappearance, GRU unit (gated recurrent unit) and LSTM unit (long short-term memory unit) are introduced.

(4) GRU and LSTM unit

1. GRU (Gated Recurrent Unit)

The GRU has two gates: the update gate u and the correlation gate r to tell the network when to update.
The update gate u is a value between 0 and 1, which tells the network when to update the value of the memory cell.
The correlation gate r tells the correlation between c and c.

Use mathematical expressions to express as follows:
insert image description here
Intuitively, look as follows:
insert image description here

2. LSTM (long short-term memory)

LSTM has three gates: update gate u, forget gate f, and output gate o.
Use mathematical expressions to express as follows:
insert image description here
intuitively express as follows:
insert image description here

(5) BRNN and Deep RNNs

1. BRNN (Bidirectional Recurrent Neural Network)

Bidirectional recurrent neural networks solve the problem that basic neural networks can only access "past" information. It is a network composed of different units (basic RNN unit, GRU unit, LSTM unit) that can reflect "past" and "future" information.

insert image description here
The y value is not only related to the "past", but also affected by the "future".

2. Deep RNNs (deep recurrent neural network)

A deep recurrent neural network is a time-expanded network after stacking multiple RNNs. Each of its layers shares weights .
insert image description here

epilogue

The learning content is the fifth lesson of Wu Enda's in-depth study: sequence model,
some screenshots are from Huang Haiguang's in-depth study notes, and
the content is just a backup of his daily study notes.

Guess you like

Origin blog.csdn.net/qq_43842886/article/details/113406878