Hebye depth study Dropout principle to resolve

Introduction 1. Dropout

The reason appears 1.1 Dropout

       In the model of machine learning, too many parameters if the model, and too little training samples, trained model is prone to over-fitting phenomenon .

       When training the neural network frequently encountered problems fitting, over-fitting expression in concrete : model training data on a smaller loss function, high prediction accuracy; but in the loss of function test data is relatively large, prediction accuracy is low.

       Machine learning is a lot of over-fitting a common problem. If the over-fitting model, the model receives almost unusable. To solve the problem of over-fitting, usually using an integrated model approach, namely training multiple models combined. In this case, time-consuming training model has become a big problem, not only time-consuming training multiple models, test multiple models is very time-consuming.

      In summary, the depth of training the neural network, always encounter two major drawbacks:

   (1) easy to over-fitting

   (2) time-consuming

     Dropout can be more effectively alleviate the overfitting occurred, to regularization effect to some extent.

    

1.2 What is the Dropout

       In 2012, Hinton made Dropout in his paper "Improving neural networks by preventing co- adaptation of feature detectors" in. When a complex feed-forward neural network is trained in small data sets, easily lead to over-fitting. To prevent over-fitting, the neural network may improve performance by preventing the interaction feature detector .
      In 2012, Alex, Hinton uses Dropout algorithm in their paper "ImageNet Classification with Deep Convolutional Neural Networks ", to prevent over-fitting. And, AlexNet network model paper mentioned detonated a neural network boom, and won the 2012 championship image recognition, making the core of CNN on image classification algorithm model.

       Then, there are a number of articles on Dropout "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", "Improving Neural Networks with Dropout", "Dropout as data augmentation".

      From the above paper, we can feel the importance of Dropout in depth learning. So, in the end what is Dropout it?  

      Dropout can be selected for training as a trick depth neural network. In each batch training, by ignoring half of the feature detector (so half of the hidden layer is 0), it can significantly reduce over-fitting . This approach can reduce interaction between the feature detector (hidden nodes), refer to certain interaction detector detectors rely other detectors can play a role.

      Dropout say the point is simple: when we forward spread, so that the value of a neuron is activated with a certain probability p to stop working, so you can make a more generalized model, because it does not rely too much on some local characteristic, as shown in FIG.

 Figure 1: Use of neural network model Dropout

2. Dropout workflow and use

2.1 Dropout specific workflow

         Suppose we want to train such a neural network, as shown in Fig.

         

     Figure 2: Neural Network Standard

      Input is the output x is y, the normal process is: we first x, then the error back-propagation to determine how to update the network parameters to make learning to spread through the network before. After using Dropout, the process becomes as follows:

    (1) First, random (provisional) Deletion network half hidden neurons, output neurons remains constant input (broken line in FIG. 3 is a partial neurons temporarily deleted)
 

  Figure 3: Some neurons temporarily deleted

     (2) the input x and the front propagation through the network modified, then the resulting loss by modifying the back propagation network. After a small number of training samples after executing this process, the neurons are not deleted update a corresponding method parameters (w, b) decreased in accordance with the stochastic gradient.

     (3) and then continue to repeat this process:

             Restoring neurons are deleted (for the removed intact neurons, but not deleted neurons have been updated)
             . From the hidden layer neurons randomly select half the size of a subset of temporary removed ( backups are deleted parameters neurons).
             For a small number of training samples, then the previous back propagation loss and lowered update propagation parameters (w, b) (not deleted some of the parameters that are updated according to the stochastic gradient method, neuron parameters deleted

               Keep the results before being deleted).
Repeats this process.
 

2.2 Dropout using neural networks

        Specific workflow Dropout above have been described in detail before, but how to make some neurons stop working at a certain probability (is to be deleted)? How the code level to achieve it?

下面,我们具体讲解一下Dropout代码层面的一些公式推导及代码实现思路。

(1)在训练模型阶段

         无可避免的,在训练网络的每个单元都要添加一道概率流程。

图4:标准网络和带有Dropout网络的比较

对应的公式变化如下:

  •  . 没有Dropout的网络计算公式:

 

  • . 采用Dropout的网络计算公式:

       上面公式中Bernoulli函数是为了生成概率r向量,也就是随机生成一个0、1的向量。

       代码层面实现让某个神经元以概率p停止工作,其实就是让它的激活函数值以概率p变为0。比如我们某一层网络神经元的个数为1000个,其激活函数输出值为y1、y2、y3、......、y1000,我们dropout比率选择0.4,那么这一层神经元经过dropout后,1000个神经元中会有大约400个的值被置为0。

       注意: 经过上面屏蔽掉某些神经元,使其激活值为0以后,我们还需要对向量y1……y1000进行缩放,也就是乘以1/(1-p)。如果你在训练的时候,经过置0后,没有对y1……y1000进行缩放(rescale),那么在测试的时候,就需要对权重进行缩放,操作如下。

(2)在测试模型阶段

      预测模型的时候,每一个神经单元的权重参数要乘以概率p。

图5:预测模型时Dropout的操作

测试阶段Dropout公式:


 

3. 为什么说Dropout可以解决过拟合?

(1)取平均的作用:先回到标准的模型即没有dropout,我们用相同的训练数据去训练5个不同的神经网络,一般会得到5个不同的结果,此时我们可以采用 “5个结果取均值”或者“多数取胜的投票策略”去决定最终结果。例如3个网络判断结果为数字9,那么很有可能真正的结果就是数字9,其它两个网络给出了错误结果。这种“综合起来取平均”的策略通常可以有效防止过拟合问题。因为不同的网络可能产生不同的过拟合,取平均则有可能让一些“相反的”拟合互相抵消。dropout掉不同的隐藏神经元就类似在训练不同的网络,随机删掉一半隐藏神经元导致网络结构已经不同,整个dropout过程就相当于对很多个不同的神经网络取平均。而不同的网络产生不同的过拟合,一些互为“反向”的拟合相互抵消就可以达到整体上减少过拟合。

(2)减少神经元之间复杂的共适应关系: 因为dropout程序导致两个神经元不一定每次都在一个dropout网络中出现。这样权值的更新不再依赖于有固定关系的隐含节点的共同作用,阻止了某些特征仅仅在其它特定特征下才有效果的情况 。迫使网络去学习更加鲁棒的特征 ,这些特征在其它的神经元的随机子集中也存在。换句话说假如我们的神经网络是在做出某种预测,它不应该对一些特定的线索片段太过敏感,即使丢失特定的线索,它也应该可以从众多其它线索中学习一些共同的特征。从这个角度看dropout就有点像L1,L2正则,减少权重使得网络对丢失特定神经元连接的鲁棒性提高。

(3)Dropout类似于性别在生物进化中的角色:物种为了生存往往会倾向于适应这种环境,环境突变则会导致物种难以做出及时反应,性别的出现可以繁衍出适应新环境的变种,有效的阻止过拟合,即避免环境改变时物种可能面临的灭绝。

4. Dropout在Keras中的源码分析
下面,我们来分析Keras中Dropout实现源码。

Keras开源项目GitHub地址为:

https://github.com/fchollet/keras/tree/master/keras

其中Dropout函数代码实现所在的文件地址:

https://github.com/fchollet/keras/blob/master/keras/backend/theano_backend.py

Dropout实现函数如下:
 

图6:Keras中实现Dropout功能

我们对keras中Dropout实现函数做一些修改,让dropout函数可以单独运行。 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# coding:utf-8
import  numpy as np
  
# dropout函数的实现
def  dropout(x, level):
     if  level <  0.  or  level > =  1 #level是概率值,必须在0~1之间
         raise  ValueError( 'Dropout level must be in interval [0, 1[.' )
     retain_prob  =  1.  -  level
  
     # 我们通过binomial函数,生成与x一样的维数向量。binomial函数就像抛硬币一样,我们可以把每个神经元当做抛硬币一样
     # 硬币 正面的概率为p,n表示每个神经元试验的次数
     # 因为我们每个神经元只需要抛一次就可以了所以n=1,size参数是我们有多少个硬币。
     random_tensor  =  np.random.binomial(n = 1 , p = retain_prob, size = x.shape)  #即将生成一个0、1分布的向量,0表示这个神经元被屏蔽,不工作了,也就是dropout了
     print (random_tensor)
  
     * =  random_tensor
     print (x)
     / =  retain_prob
  
     return  x
  
#对dropout的测试,大家可以跑一下上面的函数,了解一个输入x向量,经过dropout的结果 
x = np.asarray([ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ],dtype = np.float32)
dropout(x, 0.4 )

  

函数中,x是本层网络的激活值。Level就是dropout就是每个神经元要被丢弃的概率。

注意: Keras中Dropout的实现,是屏蔽掉某些神经元,使其激活值为0以后,对激活值向量x1……x1000进行放大,也就是乘以1/(1-p)。

思考:上面我们介绍了两种方法进行Dropout的缩放,那么Dropout为什么需要进行缩放呢?

因为我们训练的时候会随机的丢弃一些神经元,但是预测的时候就没办法随机丢弃了。如果丢弃一些神经元,这会带来结果不稳定的问题,也就是给定一个测试数据,有时候输出a有时候输出b,结果不稳定,这是实际系统不能接受的,用户可能认为模型预测不准。那么一种”补偿“的方案就是每个神经元的权重都乘以一个p,这样在“总体上”使得测试数据和训练数据是大致一样的。比如一个神经元的输出是x,那么在训练的时候它有p的概率参与训练,(1-p)的概率丢弃,那么它输出的期望是px+(1-p)0=px。因此测试的时候把这个神经元的权重乘以p可以得到同样的期望。

总结:

当前Dropout被大量利用于全连接网络,而且一般认为设置为0.5或者0.3,而在卷积网络隐藏层中由于卷积自身的稀疏化以及稀疏化的ReLu函数的大量使用等原因,Dropout策略在卷积网络隐藏层中使用较少。总体而言,Dropout是一个超参,需要根据具体的网络、具体的应用领域进行尝试。


 

Reference:

 

Hinton G E, Srivastava N, Krizhevsky A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. arXiv preprint arXiv:1207.0580, 2012.

Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//Advances in neural information processing systems. 2012: 1097-1105.

Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting[J]. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.

Srivastava N. Improving neural networks with dropout[J]. University of Toronto, 2013, 182.

Bouthillier X, Konda K, Vincent P, et al. Dropout as data augmentation[J]. arXiv preprint arXiv:1506.08700, 2015.

深度学习(二十二)Dropout浅层理解与实现,地址:https://blog.csdn.net/hjimce/article/details/50413257

理解dropout,地址:https://blog.csdn.net/stdcoutzyx/article/details/49022443

Dropout解决过拟合问题 - 晓雷的文章 - 知乎,地址:https://zhuanlan.zhihu.com/p/23178423

李理:卷积神经网络之Dropout,地址:https://blog.csdn.net/qunnie_yi/article/details/80128463

Dropout原理,代码浅析,地址:https://blog.csdn.net/whiteinblue/article/details/37808623

Deep learning:四十一(Dropout简单理解),地址:https://www.cnblogs.com/tornadomeet/p/3258122.html?_t_t_t=0.09445037946091872
 


 

本文引自:https://blog.csdn.net/program_developer/article/details/80737724

 

Guess you like

Origin www.cnblogs.com/tan2810/p/11291882.html