lstm and GRU derivation

lstm: When the length of Memories, is a new improved circulation neural networks can solve the long-distance dependencies RNN can not handle.

 

 RNN original hidden layer only one state, namely h, it is very sensitive to short-term inputs. Add a status, i.e. C, to make the long-term preservation state, called cell state (cell state).

 

Expand as shown in the time dimension follows:

 

 At time t, there are three input LSTM: the current input value of the network time, the time LSTM output value, and a timing unit status; LSTM has two outputs: timing LSTM current output value of the current time, and the cell state. Long-term use of three state control switch controls c:

 

In the algorithm implemented using the gate function of the three states:

A door that is fully connected layers, an input vector, output vector is a real number between 0 and 1.

 

 

 Principle door control: according to the elements multiplied by the vector we need to control the output vector gate, the gate output is either 0 or 1, 0 0 multiplied by any representative of the vector is not passed, multiplied by 1 does not change any vector .

Forgotten calculated doors:

 

 

 Forgotten door: decision unit state on a time c_t-1 number of reservations to c_t current state, Wf is forgotten door weight matrix, [ht-1, xt] represents two variables spliced together, bf is forgotten door offset term, a sigmoid function.

 

Calculated input gate:

 

 Enter the door: determines the input current network x_t how much time to save the state unit c_t.

The output of the last cell state and the current input of the current input is calculated:

 

 

C_t current cell state on the computing time by the primary unit state c_t-1 is multiplied by the multiplying element. Ft forgotten door, by the input unit in a state c_t i_t gate with the current input, add the product of two and can be current long-term memory and memories combine to form a new cell state. Since the control gate can be forgotten long, long time to save the information. Since the control input of the door to avoid irrelevant content into the memory.

The goal is to learn 8 sets of parameters:

 

Weight matrix is ​​formed by splicing two matrices. Errors in the propagation time is the reverse of the error term at time t is defined:

 

 

 Weight matrix is ​​calculated as follows:

 

The overall process summary:

Original input current loop are input x_tand the output of the front step h_{t-1}and the step of state C_{t-1},

x_t, h_{t-1}First encountered the forgotten door (forget Gate) :

f_{t}=sigmoid(W_f[h_{t-1},x_t]+b_f)

After forgetting function gate produces an output between 0 and 1 f_t, representing a state before forgetting how much C_{t-1}, when f_tthe representative is forgotten all 0, 1 being completely maintained.

Another one on the route x_t, h_{t-1}will meet the input gate (the INPUT Gate) , enter the door will decide which values memory:

i_t=sigmoid(W_i[h_{t-1},x_t]+b+i)

Also at the same time through the fishyfunction will generate a new status C'_t:

C'_t=tanh(W_C[h_{t-1},x_t]+b_C)

This time, the C_{t-1}, f_t, C'_t, i_tyou can determine the current state of the loop body C_tof:

C_t=f_t*C_{t-1}+i_t*C'_t

With the current state, naturally, you can go to the output of the gate (output gate) of:

o_t=sigmoid(W_o[h_{t-1},x_t]+b_o)

h_t=o_t*tanh(C_t)

From the above equation, we easily find that form each gate is the same, is through sigmoidthe current input function operates x_tand outputs the previous time h_{t-1}is generated a value of 0 to 1, in order to determine how much information.

 

 

Guess you like

Origin www.cnblogs.com/limingqi/p/12638664.html