RNN, LSTM and GRU Profile

RNN illustration

This is the network structure, and time-related, in order to better display can be expanded in time

Here Insert Picture Description
Launched by training times
Here Insert Picture Description
Here Insert Picture Description

LSTM

Reference: reference 1 , reference 2 , reference 3
Here Insert Picture Description
Here Insert Picture Description
Why tanh?
In order to overcome the gradient disappears, we need a second derivative of the function can be maintained for a long distance before reaching zero. tanh is a suitable function having such a property.
Why use Sigmoid?
Since the Sigmoid function output can be 0 or 1, it can be used to determine forget or remember information.

The effect of three doors:
Here Insert Picture Description
when ft = 0, it = 1, internal state history information contains ct-1 is discarded, the history information is cleared, then only the internal state of the record information ct time t-1, and also candidate is shaped
when ft = 1, when 0 it =, only the internal state of the copying history information ct previous internal state of ct-1, without writing new information brought by the xt.

Forgotten door equation:
Here Insert Picture Description
where Wf * [h, x] can be written as the following forms,
Here Insert Picture Description
input gate:
Here Insert Picture Description
the memory cell state:
Here Insert Picture Description
Here Insert Picture Description
output gates:
Here Insert Picture Description
the final output:
Here Insert Picture Description

TOWER CRANE

Here Insert Picture Description
Formula:
Here Insert Picture Description
The input gate, forgetting gate, the output of gate becomes two doors: Forgetting input gate doors and gates into one update (Update Gate) and a reset gate (Reset Gate).
The state of the output means into one state:

Published 70 original articles · won praise 1 · views 2410

Guess you like

Origin blog.csdn.net/weixin_43794311/article/details/105182494