Advanced Recurrent Neural Networks
1.GRU
2.LSTM
3.Deep RNN
4.Bidirection NN
1.GRU
RNN problems: gradient more prone to decay or explosion (the BPTT)
⻔ control loop neural Open networks: capture time sequence of steps from the time dependence more zoomed
1.1 mathematical expression
\[ R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\\ Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\\ \widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\\ H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙\widetilde{H}_t \]
1.2 Structure
- ⻔ reset (reset gate): time series ⾥ helps to capture the short-term dependency;
- Update ⻔ (update gate): helps to capture time-series dependency ⾥ ⻓ period.
1.3 realized
- Official achieve: https://pytorch.org/docs/1.3.0/nn.html#gru
- Handwriting achieve:
2.LSTM
2.1 mathematical expression
\[ \begin{split}\begin{aligned} \boldsymbol{I}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xi} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{hi} + \boldsymbol{b}_i),\\ \boldsymbol{F}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xf} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{hf} + \boldsymbol{b}_f),\\ \boldsymbol{O}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xo} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{ho} + \boldsymbol{b}_o), \end{aligned}\end{split} \]
\ [\ Tilde {\ boldsymbol {C}} _ t = \ text {tanh (\ boldsymbol {X} _t \ boldsymbol {W} _ {xc} + \ boldsymbol {H} _ {t-1} \ boldsymbol {W } _ {hc} + \ boldsymbol {b} _c), \\ \ boldsymbol {C} _t = \ boldsymbol {F} _t \ \ adviser boldsymbol {C} _ {t-1} + \ boldsymbol {I} _t \ worth \ tilde {\ boldsymbol {C}} _ t, \\ \ boldsymbol {H} _t = \ boldsymbol {O} _t recommended \ \ text {tanh (\ boldsymbol {C} _t). \]
2.2 Structure
- Forgotten door ( \ (\ _t is boldsymbol {} F. \) ): A time step of the control memory cells
- Input gate ( \ (\ boldsymbol {} _t is the I \) ): a control input of the current time step
- Output gate ( \ (\ _t is boldsymbol {O} \) ): hidden from the control memory cells to
- Memory cells (memory cell candidate - \ (\ {tilde are \ boldsymbol} _ {C} T \) , memory cells - \ (\ _t is boldsymbol {C} \) ): ⼀ species flow of information of a particular hidden state
2.3 realized
- Official achieve: https://pytorch.org/docs/1.3.0/nn.html#lstm
- Handwriting achieve:
3.Deep RNN
3.1 mathematical expression
\[ \boldsymbol{H}_t^{(1)} = \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(1)} + \boldsymbol{H}_{t-1}^{(1)} \boldsymbol{W}_{hh}^{(1)} + \boldsymbol{b}_h^{(1)})\\ \boldsymbol{H}_t^{(\ell)} = \phi(\boldsymbol{H}_t^{(\ell-1)} \boldsymbol{W}_{xh}^{(\ell)} + \boldsymbol{H}_{t-1}^{(\ell)} \boldsymbol{W}_{hh}^{(\ell)} + \boldsymbol{b}_h^{(\ell)})\\ \boldsymbol{O}_t = \boldsymbol{H}_t^{(L)} \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]
3.2结构
4.Bidirection RNN
4.1数学表达式
\[ \begin{aligned} \overrightarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(f)} + \overrightarrow{\boldsymbol{H}}_{t-1} \boldsymbol{W}_{hh}^{(f)} + \boldsymbol{b}_h^{(f)})\\ \overleftarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(b)} + \overleftarrow{\boldsymbol{H}}_{t+1} \boldsymbol{W}_{hh}^{(b)} + \boldsymbol{b}_h^{(b)}) \end{aligned} \]
\[ \boldsymbol{H}_t=(\overrightarrow{\boldsymbol{H}}_{t}, \overleftarrow{\boldsymbol{H}}_t) \]
\[ \boldsymbol{O}_t = \boldsymbol{H}_t \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]