Recurrent Neural Network (RNN) model the forward back-propagation algorithm, LSTM model notes

Recurrent Neural Network (RNN) model the forward back-propagation algorithm, model LSTM notes
have feedback between the output of the neural network and model: Recurrent Neural Network (Recurrent Neural Networks), are widely used in speech recognition natural language processing, handwriting recognition as well as machine translation and other fields.
Features:
1. hidden h hi-1 and determined by the input x before a hidden state.
2. The linear relationship between the model parameters U, W, V matrix RNN shared across the network, which reflects the model feedback loop RNN thought.
In speech recognition, handwriting recognition and machine translation and other areas of practical application is broader LSTM RNN model-based
model.
Long Short-Term Memory (LSTM) , to avoid the conventional RNN gradient disappears.
Gradient disappear: the neural network training method is mainly BP algorithm (error BackPropagation, error back propagation), BP algorithm is based chain rule derivative, i.e. the product of a plurality of derivatives. Sigmoid and maximum of the derivative of 0.25, and most values are pushed to both sides of the saturation region, which results in most of the value after sigmoid activation function, which derivatives are very small, a plurality of multiplied values less than or equal to 0.25, which result of the operation is small. After the neural network with deepening layers, shallow gradient to spread to the network, not the basic cause disturbance parameters, that is, no information will be passed to the loss of the shallow network, this network can not be trained to learn. This is called the gradient disappears. Solution gradient has disappeared:
. A ReLU using other functions such as the activation and the like.
B. layer was normalized
C. optimized weights initialization method
D. Construction of the novel network structure, such as highway net, intended to cancel the capsule net BP learning process, drastic.
LSTM t time index position each sequence spread much further forward hidden state, which we call state of the cell (Cell State), in addition to the state of the cells, a number of more than LSTM structure called gating structure (the GATE) . LSTM door position in each sequence index t generally include forgetting gate, the input and output gates three.
Forgotten door (forget gate): In LSTM ie to control whether certain probability forgotten hidden layer of cells on the state. Output represents the probability of forgetting the hidden layer cell state.
Enter the gate (input gate): responsible for processing the input position of the current sequence. Sigmoid activation function using a first portion, a second portion using the tanh activation function, after the results of the two will again update the state of the cell multiplication. Hadamard product is the matrix product.
Output gate (output gate).
Reference Blog:
https://www.cnblogs.com/pinard/p/6509630.html
https://www.cnblogs.com/pinard/p/6519110.html

Guess you like

Origin www.cnblogs.com/yy1921rz/p/10964085.html