Depth learning algorithm (No. 7) ---- deep learning of avoiding over-fitting (regularization)

Welcome attention to micro-channel public number " smart algorithm " - the original link (for better reading experience):

Depth learning algorithm (No. 7) ---- deep learning of avoiding over-fitting (regularization)

On one, we were learning the tragic fate of the learning rate of depth learning,

Fate deep learning algorithm (stage 6) ---- depth study of the learning rate

Today we learn together at the depth of learning how to avoid over-fitting, we conduct more exchanges and common progress. In this issue as follows:

  • Stop training early
  • L1 and L2 norm regularization
  • DroupOut
  • Maximum norm regularization
  • Data enhancement
  • summary

We know that there are thousands of deep learning parameters, or even millions. Because of the huge parameters, so the entire network has an incredible degree of freedom, able to fit complex data sets. In other words, this means great flexibility in the training set is easy to over-fitting. Here, we mainly through a few common methods, look at how to avoid over-fitting, are: premature stop training, L1 and L2 norm regularization, DroupOut, maximum norm regularization, data enhancement.

A. In advance to stop training

In order to avoid causing over-fitting the training set, a good way is to stop before over-fitting the training of the network (before the article has introduced). That is to say before the performance on the test set begins to decline when to stop training the training data set.

In TensorFlow in that in the training through evaluation of the performance on the test set, to save a current optimal network at regular intervals if a better spacing on the network than is replaced on a network. At the end of this training will save a whole network of the best training occurred during a model. Although in practice the method can stop the advance of the training is very good, but in general, if it can be combined with other regularization technique, it can be better realized.

 

Two. L1 and L2 norm regularization

Just as before while we study linear model, we can also use the L1 and the L2 norm constraint neural network weights (typically not bias term). Achieve regularization is relatively simple in TensorFlow, only need to add the loss of function in the appropriate regularization term can be. For example: If our network has only one hidden layer, weight is weights1, an output layer and a weight of weight2. Then we can reuse L1 of the right to regularization, as follows:

The above method, although no problem, but if we have a lot of network layer, so that the above method is not too convenient. Fortunately, for the multi-layered regularization, TensorFlow a better way. TensorFlow There are many variables we will accept function creates a regularization parameter when it is created. We can re-transmit weighted function as a parameter and returns the corresponding regularization loss. as follows:

 

The above code creates a neural network output layer and two hidden layers, each layer and the weight for weight, is created for the node and losses L1 regularization in FIG. TensorFlow automatically lose all regularization applied to a particular set. We only need these regularization loss added to the overall loss, as follows:

 

三. DropOut

However, in the depth of the neural network, the most popular regularization technique is DropOut, the technology proposed by GEHinton in 2012. Nitish Srivastava was later developed and proven to be very successful. Even the best network, if the increase DropOut technology, it is possible to enhance the accuracy of 1 to 2 percentage points. This seems not increase too much, but if a model has already reached 95% accuracy rate, then increase by two percentage points means that the error rate is reduced by 40% (from 5% to 3%).

In fact, this is a very simple algorithm: that is, every step of the training, each neuron (including input neurons, but does not include output neurons), there is a probability p is discarded. Neurons are discarded, meaning that it is not used in this training, but it may be activated in the next iteration. This ultra-parameter p become DropOut rate is generally set at 50%. As shown below:

At first glance, this rude approach works well weird. But it turns out does work! Since each neuron random to give up, the final result will train each neuron will not be overly dependent on other neurons, but to make efforts to bring themselves up to the best. The final network is not sensitive to slight changes in the input, then get a very robust network.

因为我们训练的时候会随机的丢弃一些神经元(概率为p),但是预测的时候就没办法随机丢弃了。如果丢弃一些神经元,这会带来结果不稳定的问题,也就是给定一个测试数据,有时候输出a有时候输出b,结果不稳定,这是实际系统不能接受的,用户可能认为模型预测不准。那么一种”补偿“的方案就是每个神经元的权重都乘以一个(1-p),或者在训练的时候除以(1-p),这样在“总体上”使得测试数据和训练数据是大致一样的。比如一个神经元的输出是x,那么在训练的时候它有(1-p)的概率参与训练,p的概率丢弃,那么它输出的期望是(1-p)x+p0=(1-p)x。因此测试的时候把这个神经元的权重乘以(1-p)可以得到同样的期望。

在TensorFlow中如何运用dropout呢?只需要简单的在输入层和隐藏层之前加上dropout函数即可。在training的 过程中,这个函数会随机将一些神经元置为0,并且自动除以(1-p)。下面代码展示了如何在TensorFlow中运用dropout正则化技术,如下:

正如之前batch Normalization一样,在训练的时候我们需要设置is_training 为true,测试的时候设为false。当我们观察到模型出现过拟合的时候,我们可以增加dropout率,也就是说减小keep_prob率。相反,如果模型欠出现拟合的时候,可以增加keep_prob率,减小dropout率。通常对于大的网络增加dropout率,小的网络减少dropout率往往会有帮助。

然而,一般情况下,加入dropout的话,会使训练收敛明显放慢。但是往往会得到更好的模型,很值!

四. 最大范数正则化

另外一个神经网络中常见的正则化技术就是最大范数正则化。对于每一个神经元的权重,都受到如下的约束:

其中||w||_2为L2范数,r为最大范数。通常情况下,我们通过计算w的L2范数来进行达到目的。如下:

最大范数正则化,往往能够降低过拟合,如果不适用batch正则化的话,也可以减轻梯度消失和梯度爆炸的问题。

TensorFlow并没有提供一个现成的最大范数正则化函数,但是实施起来也并不麻烦。如下:

上面代码创建一个clip_weights的节点,用来调整weights变量。我们可以在每一次迭代之后加上这个操作,如下:

我们可以通过scope来观察权重的变化,如下:

也可以通过root scope来得到权重,如下:

 

如果我们不知道变量的名字,那么可以通过TensorBoard或者简单的用global_variables()函数来查看变量名,如下:

尽管上面的方法可行,但是显得有些繁琐。一个更为简洁的方法就是创建一个最大范数正则化的函数,就好比前面学的L1,L2范数正则化函数一样,如下:

上面的函数返回的是一个参数化的函数max_norm(),可以向其他正则化一样:

注意到最大范数正则化不需要在全局的损失函数上增加正则项,所以max_norm()函数返回为None。但是我们仍需要在每次迭代之后运行clip_weights,这就是为什么max_norm()函数中将clip_weights增加到collection中。我们需要获取clip操作并在每次迭代之后运行,如下:

 

五. 数据增强

最后一个正则化技术就是数据增强,主要是认为的从已有的数据中人为的产生新的数据,进而来增加数据集,降低过拟合。比如,我们要对蘑菇图片进行分类,我们可以通过轻微的旋转,平移,缩放等方法来产生新的数据,如下图:

这就去迫使模型能对蘑菇的位置,大小,方位等适应,也可以通过改变图片的对比度来使得模型适应不同的光照。通过这些方法,可以使大大扩大数据集。

TensorFlow中提供了一些图像操作的方法,比如平移,旋转,缩放,翻转,剪切,调整光照对比度,饱和度,色度等,这就使得对数据增强比较方便。

 

六. 小结

这期我们主要通过几个常见的方法:过早的停止训练,L1和L2范数的正则化,DroupOut, 最大范数正则化,数据增强等。来避免模型过拟合,几种方法各有千秋,找到最适合自己模型的方法才是王道。

 

(如需更好的了解相关知识,欢迎加入智能算法社区,在“智能算法”公众号发送“社区”,即可加入算法微信群和QQ群)

 

Guess you like

Origin blog.csdn.net/x454045816/article/details/92208722