Understand the learning rate and how it can improve the performance of deep learning

Source: ATYUN AI platform 

The learning rate is an important parameter super-depth learning, how to adjust the learning rate is a good one training model of key elements. This article will focus on the following points:

  • What is the learning rate? What is its significance?
  • How to achieve a good learning rate systematically?
  • Why should we change the learning rate during training?
  • When using a model pre-trained, how we deal with the learning rate?

First, learn what rate is it? Learning rate is a super parameter, which controls the extent to which we adjusted our network weights, and loss gradient adjusted. The lower the value, the slower downward slope. While this may be a good way (using a low learning rate), to make sure we do not miss any of the local minimum value, but it also may mean that we will take a long time convergence - especially if we stalled not in front of the area. The following formula shows this relationship.

new_weight = existing_weight — learning_rate * gradient

 

Understand the learning rate and how it can improve the performance of deep learning

 

Gradient descent small (upper) and a large (bottom) learning rate. Source: Coursera machine learning courses

In general, the learning rate is arbitrarily configured by the user. In the best case, the user can obtain the best value for the learning rate set by past experience (or other types of learning materials).

Similarly, we often difficult to get it done right. The following illustration shows the different scenarios when configuring the learning rate encounter.

Understand the learning rate and how it can improve the performance of deep learning

 

Study the effects of different rates of convergence

In addition, the learning rate influences how our model is the convergence of a local minimum value (that is, to achieve the most accurate precision). Therefore, to get the right results obtained from the results will mean we will spend less time training model.

The less training time, less money spent on GPU cloud computing.

There are better ways to determine the learning rate it?
At each iteration, you can be trained by a very low learning rate model, and (linear or exponential) which increases to estimate a good learning rate.

 

Understand the learning rate and how it can improve the performance of deep learning

 

After each mini-batch iterative learning rate increase

如果我们在每次迭代中记录学习情况,并将学习速率(log)与损失(loss)进行划分;我们会看到随着学习速率的增加,会在一个点上,损失停止下降并开始增加。在实践中,我们的学习速率最好是在留在某个地方,也就是图表的最低点(如下图所示)。在这种情况下,最低点的范围在0.001到0.01之间。

 

Understand the learning rate and how it can improve the performance of deep learning

上面的办法似乎是有用的。

我如何开始使用它?
目前,在fast.ai包中,它被作为一个函数来支持,这由杰里米·霍华德开发的人工智能包,是一种抽象pytorch包的方法(就像Keras是一种对Tensorflow的抽象)。只需输入以下命令,就可以在训练神经网络之前找到最优的学习速率。

# learn is an instance of Learner class or one of derived classes like ConvLearner
learn.lr_find()
learn.sched.plot_lr()

 

提高性能
我们已经讨论了学习速率是什么,它是很重要的,并且当我们开始训练我们的模型时,我们如何系统地达到一个最优值。接下来,我们将学习如何使用学习速率来提高我们模型的性能。

通常,当一个人设定好学习速率并训练模型时,他只会等待学习速率随着时间的推移而下降,而模型最终会趋于一致。然而,当梯度到达一个稳定状态时,训练的损失就会变得更加难以改善。极小化损失的难度来自于鞍点,而不是局部极小值。

 

Understand the learning rate and how it can improve the performance of deep learning

误差曲面上的一个鞍点。鞍点是一个函数的导数为零的点,但点不是所有轴上的局部极值。

那么我们该怎么办呢? 我们可以考虑一些选择。一般来说,如果训练不再改善我们的损失,我们要改变学习速率每次迭代过程中根据一些循环函数f。每次循环都有一个固定长度的迭代次数。该方法可以使学习速率在合理的边界值之间变化。这很有帮助,因为如果我们被卡在鞍点上,增加学习速率就可以让鞍点达到平稳状态时更快地遍历。这有一种“三角形”的方法,在每几次迭代之后,学习速率就会重新开始。

 

Understand the learning rate and how it can improve the performance of deep learning

Understand the learning rate and how it can improve the performance of deep learning

 

“三角形”(上)和“三角形2”(下)方法。在上图中,最小和最大学习速率保持不变。在下图中,每次循环后,差异就减少了一半。

另一种很受欢迎的方法叫做“随机梯度下降法”。该方法主要使用余弦函数作为循环函数,并在每次循环的最大程度上重新启动学习速率。当学习速率重新启动时,它并不是从零开始,而是从模型在最后的步骤中收敛的参数开始。

虽然有一些变化,但是下面的图展示了它的一个实现,其中每个循环都被设置为相同的时间周期。

 

Understand the learning rate and how it can improve the performance of deep learning

 

SGDR图,学习速率vs迭代

因此,我们现在有了一种减少训练时间的方法,基本上是周期性地在“山峰”(下图)上跳跃。

 

Understand the learning rate and how it can improve the performance of deep learning

 

标准的学习速率和循环的学习速率

除了节省时间,使用这些方法可以在不调整和减少迭代的情况下提高分类精度。

在迁移学习中的学习速率
在解决人工智能问题时,fast.ai的课程在充分利用预先训练过的模型方面给予了很大的重视。例如,在解决图像分类问题时,学生们被教如何使用预训练的模型,比如VGG或Resnet50,并将其与你想预测的任何图像数据集连接起来。

在fast.ai(程序,不要与fast.ai包混淆)中总结模型是如何快速构建的,下面是我们通常采取的8个步骤:

1.启用数据增强,并设置预计算(precompute)=True
2.使用lrfind()找到最高的学习速率,其中损失仍在明显改善。
3.为1-2个epoch的预计算激活训练最后一层
4.用数据增强的方法(即预计算=False)进行最后一层的训练,使用cyclelen=1的2-3个周期
5.解冻所有的层
6.将较早的层设置为比下一个更高层的3x-10x降低的学习速率
7.再次使用lr_find()
8.用cycle_mult=2训练整个网络,直到过度拟合

在最后一节中,我们将讨论微分学习。

微分学习是什么?
微分学习是一种方法,在训练期间,你将不同的学习速率设置在网络的不同层。这与人们通常如何配置学习速率形成了鲜明的对比,即在训练过程中,在整个网络中使用相同的速率。

为了更清楚地说明这个概念,我们可以参考下面的图,在这个图中,一个预先训练的模型被划分为3个组,每个组将被配置成一个递增的学习速率值。

 

Understand the learning rate and how it can improve the performance of deep learning

 

以微分学习速率抽样的卷积神经网络(CNN)

这种配置方法背后的直观看法是,最初的几层通常包含非常详细地数据细节,比如线条和边缘——我们通常不希望改变太多,并希望保留信息。因此,不需要大量的改变它们的权重。

相反,在之后的层中,比如上面的绿色部分,我们可以看到数据的细节; 并且可能并不需要对它们进行保留。

本文转自ATYUN人工智能媒体平台,原文链接:了解学习速率以及它如何提高深度学习的表现

更多推荐

智能体通过观看视频学习子程序,进行高水平规划

Petuum:利用人工智能的力量缩短新产品开发时间

不只是模仿:研究人员让机器人理解和学习人类的双手动作

Smarter than the OCR! AWS launched Amazon Textract, automatic extraction of text in a document and data

Welcome to the official public attention ATYUN number, business cooperation and contribute content, please contact E-mail: bd@atyun.com
Welcome to the official public attention ATYUN number, business cooperation and contribute content, please contact E-mail: [email protected]

 

Guess you like

Origin blog.csdn.net/whale52hertz/article/details/95055802