[Deep Learning] RNN recurrent neural network and LSTM deep learning model

1. Recurrent Neural Network RNN ​​(Recurrent Neural Network)

A core of the recurrent neural network is that it can use the previous sequence data to predict the subsequent results. How to achieve this. The RNN structure is shown in the figure below.
Insert image description here
After the information in the front sequence is processed, it is passed to the back sequence as input information.

mathematical model:

a 1 = g a ( W h a 0 + W i x 1 + b a ) a^1=g_a(W_{h}a^0+W_{i}x^1+b_a) a1=ga(Wha0+Wix1+ba) y 1 = g y ( W y a 0 + W o x 1 + b i ) y^1=g_y(W_{y}a^0+W_{o}x^1+b_i) y1=gy(Wya0+Wox1+bi)
a 2 = g a ( W h a 1 + W i x 2 + b a ) a^2=g_a(W_{h}a^1+W_{i}x^2+b_a) a2=ga(Wha1+Wix2+ba) y 2 = g y ( W y a 1 + W o x 2 + b i ) y^2=g_y(W_{y}a^1+W_{o}x^2+b_i) y2=gy(Wya1+Wox2+bi)
……
a t = g a ( W h a t − 1 + W i x t + b a ) a^t=g_a(W_{h}a^{t-1}+W_{i}x^t+b_a) at=ga(What1+Wixt+ba) y t = g y ( W y a t − 1 + W o x t + b i ) y^t=g_y(W_{y}a^{t-1}+W_{o}x^t+b_i) yt=gy(Wyat1+Woxt+bi)

ggg is the activation function,W, b W,bW,b is the training parameter.

2. Different types of RNN models

Basic RNN model structure
Insert image description here
input: x 1 , x 2 , x 2 , . . . , xix^1,x^2,x^2,... ,x^ix1,x2,x2,...,xi,输出:y 1 , y 2 , y 2 , . . . , yiy^1,y^2,y^2,... ,y^iy1,y2,y2,...,yi
multi-input to multi-output, RNN structure with the same dimensions.
Application: Specific information identification.

Multiple input single output RNN structure
Insert image description here

Import: x 1 , x 2 , x 2 , . . . , xix^1,x^2,x^2,... ,x^ix1,x2,x2,...,xi , Exit:yyy
Application: Emotion Recognition

Single input multiple output RNN structure
Insert image description here

Input: xi xix i,输出:y 1 , y 2 , y 2 , . . . , yiy^1,y^2,y^2,... ,y^iy1,y2,y2,...,yi
application: sequence data generator, such as article generation, music generation

Multiple-input multiple-output RNN structure
Insert image description here
input: x 1 , x 2 , x 2 , . . . , xix^1,x^2,x^2,... ,x^ix1,x2,x2,...,xi,输出:y 1 , y 2 , y 2 , . . . , yjy^1,y^2,y^2,... ,y^jy1,y2,y2,...,yj
Application: Language Translation

When the Bidirectional Recurrent Neural Network (BRNN)
makes a judgment, it also takes the posterior sequence information into consideration.

Insert image description here

Deep recurrent neural networks (DRNN)
solve more complex sequence tasks, and can be combined with single-layer RNN stacks or full connections before output.
Insert image description here

Defects of ordinary RNN structures

First, when the front sequence information is transmitted to the back, the weight of the information decreases, resulting in the loss of important information. The gradient disappears during the solution process.
Insert image description here
Second, RNN sometimes loses parameters during the training process.
Insert image description here
In the process of minimizing the loss function, the gradient will suddenly and violently jitter, resulting in parameter loss.
Insert image description here
So why is this happening?
Take the RNN structure in the figure below as an example, assuming W i , W o W^i,W^oWi,Wo are all 1, and the input length is 1000, theny 1000 = w 999 y^{1000}=w^{999}y1000=w999
Insert image description here

We assume wwThe initial value of w is 1, inwwWhat happens when w changes slightly?
Insert image description here
So the problem with RNN is that during the training process, the samewww is used repeatedly at different points in time,wwOnce w has an impact, it will have a great impact.

Long Short-term Memory Network LSTM (Long Short-term Memory)

Using LSTM can optimize the defects of the RNN structure mentioned above. Add memory cells cic^i to the original ordinary RNN structural unitci , can convey information about distant parts of the front.
Insert image description here
The LSTM structure consists of three gates, four inputs, and one output. The three gates are input gate, forget gate, and output gate. Use these three gates to control which information should be forgotten and discarded, which information should be retained, or kept unchanged. The four inputs are, input dataZZZ , input gate control signalZ i Z_iZi, forget the gate control signal Z f Z_fZfand output gate control signal Z o Z_oZo
Insert image description here

The door here can be understood as an activation function. This activation function is usually a sigmoid function, because the value of the sigmoid function is between 0 and 1, which is used to control the closing and opening of the door.

Assume that the original value stored in memory is ccc , the updated value isc ′ = g ( z ) f ( zi ) + cf ( zf ) c'=g(z)f(z_i)+cf(z_f)c=g(z)f(zi)+cf(zf) c ′ c' c is the new value stored in memory. From this formula we can see thatf ( zi ) f(z_i)f(zi) is to controlZZCan Z be entered as a level?f (zi) = 0 f(z_i)=0f(zi)=0 means there is no input,f (zi) = 1 f(z_i)=1f(zi)=1 has input. f ( zf ) f(z_f)f(zf) controls whether the value in memory will be updated. f (zf) = 0 f(z_f)=0f(zf)=At 0 , forget that the door is open and write 0 into the cell,f (zf) = 1 f(z_f)=1f(zf)=When 1 , it passes directly, and the cell median value remains unchanged or c. f (zo) f(z_o)f(zo) controls whether there is an output value.

Z i , Z f , Z , Z o Z_i,Z_f,Z_,Z_o Zi,Zf,Z,ZoAll by XXThe input obtained by multiplying X by the weight matrix. Enter the sequence as shown below.
Insert image description here
LSTM can solve the problem of RNN gradient disappearance. The values ​​in RNN memory will be cleared every time, while the memory in LSTM is always superimposed, and will only be cleared unless the door is closed.

Guess you like

Origin blog.csdn.net/Luo_LA/article/details/133314652