Hands-on learning pytorch- Advanced Recurrent Neural Networks

Advanced Recurrent Neural Networks

1.GRU
2.LSTM
3.Deep RNN
4.Bidirection NN

1.GRU

RNN problems: gradient more prone to decay or explosion (the BPTT)
⻔ control loop neural Open networks: capture time sequence of steps from the time dependence more zoomed

1.1 mathematical expression

\[ R_{t} = σ(X_tW_{xr} + H_{t−1}W_{hr} + b_r)\\ Z_{t} = σ(X_tW_{xz} + H_{t−1}W_{hz} + b_z)\\ \widetilde{H}_t = tanh(X_tW_{xh} + (R_t ⊙H_{t−1})W_{hh} + b_h)\\ H_t = Z_t⊙H_{t−1} + (1−Z_t)⊙\widetilde{H}_t \]

1.2 Structure

  • ⻔ reset (reset gate): time series ⾥ helps to capture the short-term dependency;
  • Update ⻔ (update gate): helps to capture time-series dependency ⾥ ⻓ period.

Image Name

1.3 realized

2.LSTM

2.1 mathematical expression

\[ \begin{split}\begin{aligned} \boldsymbol{I}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xi} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{hi} + \boldsymbol{b}_i),\\ \boldsymbol{F}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xf} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{hf} + \boldsymbol{b}_f),\\ \boldsymbol{O}_t &= \sigma(\boldsymbol{X}_t \boldsymbol{W}_{xo} + \boldsymbol{H}_{t-1} \boldsymbol{W}_{ho} + \boldsymbol{b}_o), \end{aligned}\end{split} \]

\ [\ Tilde {\ boldsymbol {C}} _ ​​t = \ text {tanh (\ boldsymbol {X} _t \ boldsymbol {W} _ {xc} + \ boldsymbol {H} _ {t-1} \ boldsymbol {W } _ {hc} + \ boldsymbol {b} _c), \\ \ boldsymbol {C} _t = \ boldsymbol {F} _t \ \ adviser boldsymbol {C} _ {t-1} + \ boldsymbol {I} _t \ worth \ tilde {\ boldsymbol {C}} _ ​​t, \\ \ boldsymbol {H} _t = \ boldsymbol {O} _t recommended \ \ text {tanh (\ boldsymbol {C} _t). \]

2.2 Structure

  • Forgotten door ( \ (\ _t is boldsymbol {} F. \) ): A time step of the control memory cells
  • Input gate ( \ (\ boldsymbol {} _t is the I \) ): a control input of the current time step
  • Output gate ( \ (\ _t is boldsymbol {O} \) ): hidden from the control memory cells to
  • Memory cells (memory cell candidate - \ (\ {tilde are \ boldsymbol} _ {C} T \) , memory cells - \ (\ _t is boldsymbol {C} \) ): ⼀ species flow of information of a particular hidden state

Image Name

2.3 realized

3.Deep RNN

3.1 mathematical expression

\[ \boldsymbol{H}_t^{(1)} = \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(1)} + \boldsymbol{H}_{t-1}^{(1)} \boldsymbol{W}_{hh}^{(1)} + \boldsymbol{b}_h^{(1)})\\ \boldsymbol{H}_t^{(\ell)} = \phi(\boldsymbol{H}_t^{(\ell-1)} \boldsymbol{W}_{xh}^{(\ell)} + \boldsymbol{H}_{t-1}^{(\ell)} \boldsymbol{W}_{hh}^{(\ell)} + \boldsymbol{b}_h^{(\ell)})\\ \boldsymbol{O}_t = \boldsymbol{H}_t^{(L)} \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]

3.2结构

Image Name

4.Bidirection RNN

4.1数学表达式

\[ \begin{aligned} \overrightarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(f)} + \overrightarrow{\boldsymbol{H}}_{t-1} \boldsymbol{W}_{hh}^{(f)} + \boldsymbol{b}_h^{(f)})\\ \overleftarrow{\boldsymbol{H}}_t &= \phi(\boldsymbol{X}_t \boldsymbol{W}_{xh}^{(b)} + \overleftarrow{\boldsymbol{H}}_{t+1} \boldsymbol{W}_{hh}^{(b)} + \boldsymbol{b}_h^{(b)}) \end{aligned} \]
\[ \boldsymbol{H}_t=(\overrightarrow{\boldsymbol{H}}_{t}, \overleftarrow{\boldsymbol{H}}_t) \]
\[ \boldsymbol{O}_t = \boldsymbol{H}_t \boldsymbol{W}_{hq} + \boldsymbol{b}_q \]

4.2 Structure

Image Name

Guess you like

Origin www.cnblogs.com/54hys/p/12311202.html
Recommended