Series notes | deep learning serial (2): gradient descent

Click on the top " AI proper way ", select the "star" Public No.

Heavy dry goods, the first time served

We recall deep learning "three axes":

1. Select the neural network

2. Define the quality of neural networks

3. Select the best set of parameters

Wherein the step Third, how to choose good or bad neural networks do?

Gradient descent is one of the most effective methods.

Methods: We give two examples of parameters θ1, θ2, loss of function is L. Then its gradient is:

I order to obtain the minimum, we have:

Parameters constantly being multiplied by the gradient iterative learning rate η

So why is the public announcement above minus, not a plus sign it?

We define θ direction to change the direction of movement, and direction of gradient is the direction normal to the contour

Based Gradient Decent been introduced over, Next, we explore together the GD tips.

Learning rate setting learning rate

Learning Rate η If the setting is not good, Loss instead of increasing

Adaptive learning rate adaptive learning rate

Many small partner in machine learning code, the learning rate is generally set to a fixed value (the need to constantly adjust parameters).

According to the learning experience, we have the following general conclusions:

1. When the beginning of training, learning rate is high

2. After several rounds of training, the results slowly approaching when the need to transfer a small learning rate

Learning rate Adagrad conventional learning rate is divided by the square of the derivative of the square root and

Stochastic Gradient Decent (SGD)

Make training more quickly

GD is a general method after all the training data, a parameter update

SGD is a sample parameters can be updated

GD and SGD contrast effect:

Cutting Feature Scaling feature

Let different dimensions of data, with the same magnitude of change

Training time, which is good train, at a glance

Normalization method:

总结: Gradient Decent 是机器学习、深度学习求解Optimal问题的“普世”方法,但是也会遇到很多问题,例如local minima 和 saddle point 的问题。 我们以后会展开讨论。

本专栏图片、公式很多来自台湾大学李弘毅老师、斯坦福大学cs229、cs231n 、斯坦福大学cs224n课程。在这里,感谢这些经典课程,向他们致敬!

作者简介:武强 兰州大学博士,谷歌全球开发专家Google Develop Expert(GDE Machine Learing 方向) 

CSDN:https://me.csdn.net/dukuku5038 

知乎:https://www.zhihu.com/people/Dr.Wu/activities 

漫画人工智能公众号:DayuAI-Founder

系列笔记: 

系列笔记 | 深度学习连载(1):神经网络

           


推荐阅读

(点击标题可跳转阅读)

干货 | 公众号历史文章精选

我的深度学习入门路线

我的机器学习入门路线图

重磅!

林轩田机器学习完整视频和博主笔记来啦!

扫描下方二维码,添加 AI有道小助手微信,可申请入群,并获得林轩田机器学习完整视频 + 博主红色石头的精炼笔记(一定要备注:入群 + 地点 + 学校/公司。例如:入群+上海+复旦。 

长按扫码,申请入群

(添加人数较多,请耐心等待)

 

最新 AI 干货,我在看 

发布了251 篇原创文章 · 获赞 1024 · 访问量 137万+

Guess you like

Origin blog.csdn.net/red_stone1/article/details/103813707