Dropout (regularization)

Dropout Profile

1. In the machine learning model, the parameters of the model too much, too little training samples, trained model is prone to over-fitting phenomenon. Overfitting specific performance: the training data model on a smaller loss function, forecasting
higher accuracy; but in the loss of function test data is relatively large, low prediction accuracy.
2.Dropout: when the forward spread, so that the value of a neuron is activated with a certain probability p to stop working , so you can make more generalized model, because it does not rely too much on some local features.
Here Insert Picture Description

.Dropout workflow and use

Suppose we want to train such a neural network, shown in Figure 2:
Here Insert Picture Description
the normal process is: x is the input, the output is y, x we first before propagation through the network, then the error back propagation allows to determine how to update the parameters network learning. Dropout After use, the process becomes as follows:
(1) First, random (provisional) Deletion network half hidden neurons , output neurons remains constant input (broken line in FIG. 3 is a partial temporary neuron deleted)
Here Insert Picture Description
(2 ) and the input x before propagation through the network modified, then the resulting loss by modifying the back propagation network. After a small number of training samples after executing this process, the neurons are not deleted update a corresponding method parameters (w, b) decreased in accordance with the stochastic gradient.
(3) and then continue to repeat the process (1 and 2):
restore neurons are deleted (for the removed intact neurons, but neurons has been deleted not updated).

Dropout using neural networks

Here Insert Picture Description
No Dropout network formula:
Here Insert Picture Description
a network Dropout of formula:
Here Insert Picture Description

Why Dropout can be solved fit?

(1)取平均的作用: 先回到标准的模型即没有dropout,我们用相同的训练数据去训练5个不同的神经网络,一般会得到5个不同的结果,此时我们可以采用 “5个结果取均值”或者“多数取胜的投票策略”去决定最终结果。例如3个网络判断结果为数字9,那么很有可能真正的结果就是数字9,其它两个网络给出了错误结果。这种“综合起来取平均”的策略通常可以有效防止过拟合问题。因为不同的网络可能产生不同的过拟合,取平均则有可能让一些“相反的”拟合互相抵消。dropout掉不同的隐藏神经元就类似在训练不同的网络,随机删掉一半隐藏神经元导致网络结构已经不同,整个dropout过程就相当于对很多个不同的神经网络取平均。而不同的网络产生不同的过拟合,一些互为“反向”的拟合相互抵消就可以达到整体上减少过拟合。

(2)减少神经元之间复杂的共适应关系: 因为dropout程序导致两个神经元不一定每次都在一个dropout网络中出现。这样权值的更新不再依赖于有固定关系的隐含节点的共同作用,阻止了某些特征仅仅在其它特定特征下才有效果的情况 。迫使网络去学习更加鲁棒的特征 ,这些特征在其它的神经元的随机子集中也存在。换句话说假如我们的神经网络是在做出某种预测,它不应该对一些特定的线索片段太过敏感,即使丢失特定的线索,它也应该可以从众多其它线索中学习一些共同的特征。从这个角度看dropout就有点像L1,L2正则,减少权重使得网络对丢失特定神经元连接的鲁棒性提高。

(3)Dropout类似于性别在生物进化中的角色:物种为了生存往往会倾向于适应这种环境,环境突变则会导致物种难以做出及时反应,性别的出现可以繁衍出适应新环境的变种,有效的阻止过拟合,即避免环境改变时物种可能面临的灭绝。

Released eight original articles · won praise 0 · Views 196

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/104662952