深度学习中Dropout和Layer Normalization技术的使用

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/hellonlp/article/details/78079582

两者的论文:

Dropout:http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

Layer Normalization:  https://arxiv.org/abs/1607.06450


RECURRENT NEURAL NETWORK REGULARIZATION https://arxiv.org/pdf/1409.2329.pdf


两者的实现(以nematus为例子):

https://github.com/EdinburghNLP/nematus/blob/master/nematus/layers.py


GUR中搞Dropout的地方:




readout那一层的操作:



疑问:

1. 为什么Dropout放在LN前面?

其他人不是这个顺序

https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout-in-tensorflow

BatchNorm -> ReLu(or other activation) -> Dropout 


2. 为什么 state_below_,pctx_也要做LN?(后面没有直接上激活函数呢?)

在gru_layer中,state_below_做LN(输入的是src):


在gru_cond_layer中,state_below_又不做LN(输入的是trg):



3. Dropout以在Scan里面生成不行:https://groups.google.com/forum/#!topic/lasagne-users/3eyaV3P0Y-E 

                                                         https://groups.google.com/forum/#!topic/theano-users/KAN1j7iey68


4. Dropout in RNN

RECURRENT NEURAL NETWORK REGULARIZATION里介绍上一个hidden state传进来不要记性dropout(Figure 2),但是Nematus里面却搞了...


5. residual connections

关于residual connections,https://github.com/harvardnlp/seq2seq-attn写着:res_net: Use residual connections between LSTM stacks whereby the input to the l-th LSTM layer of the hidden state of the l-1-th LSTM layer summed with hidden state of the l-2th LSTM layer. We didn't find this to really help in our experiments.

猜你喜欢

转载自blog.csdn.net/hellonlp/article/details/78079582