RNN illustration
This is the network structure, and time-related, in order to better display can be expanded in time
Launched by training times
LSTM
Reference: reference 1 , reference 2 , reference 3
Why tanh?
In order to overcome the gradient disappears, we need a second derivative of the function can be maintained for a long distance before reaching zero. tanh is a suitable function having such a property.
Why use Sigmoid?
Since the Sigmoid function output can be 0 or 1, it can be used to determine forget or remember information.
The effect of three doors:
when ft = 0, it = 1, internal state history information contains ct-1 is discarded, the history information is cleared, then only the internal state of the record information ct time t-1, and also candidate is shaped
when ft = 1, when 0 it =, only the internal state of the copying history information ct previous internal state of ct-1, without writing new information brought by the xt.
Forgotten door equation:
where Wf * [h, x] can be written as the following forms,
input gate:
the memory cell state:
output gates:
the final output:
TOWER CRANE
Formula:
The input gate, forgetting gate, the output of gate becomes two doors: Forgetting input gate doors and gates into one update (Update Gate) and a reset gate (Reset Gate).
The state of the output means into one state: