A review of gradient descent optimization methods - 代码天地

A review of gradient descent optimization methods

其他 2018-06-08 08:47:50 阅读次数: 0

Suppose we are going to optimize a parameterized function \(J(\theta)\), where \(\theta \in \mathbb{R}^d\), for example, \(\theta\) could be a neural net.

More specifically, we want to \(\mbox{ minimize } J(\theta; \mathcal{D})\) on dataset \(\mathcal{D}\), where each point in \(\mathcal{D}\) is a pair \((x_i, y_i)\).

There are different ways to apply gradient descent.

Let \(\eta\) be the learning rate.

Vanilla batch update
\(\theta \gets \theta - \eta \nabla J(\theta; \mathcal{D})\)
Note that \(\nabla J(\theta; \mathcal{D})\) computes the gradient on of the whole dataset \(\mathcal{D}\).
```python
for i in range(n_epochs):
gradient = compute_gradient(J, theta, D)
theta = theta - eta * gradient
eta = eta * 0.95

```
It is obvious that when \(\mathcal{D}\) is too large, this approach is unfeasible.

Stochastic Gradient Descent
Stochastic Gradient, on the other hand, update the parameters example by example.
\(\theta \gets \theta - \eta *J(\theta, x_i, y_i)\), where \((x_i, y_i) \in \mathcal{D}\).
```
for n in range(n_epochs):
for x_i, y_i in D: 
    gradient=compute_gradient(J, theta, x_i, y_i)
    theta = theta - eta * gradient 
eta = eta * 0.95 
```
Mini-batch Stochastic Gradient Descent
Update \(\theta\) example by example could lead to high variance, the alternative approach is to update \(\theta\) by mini-batches \(M\) where \(|M| << |\mathcal{D}|\).
```
for n in range(n_epochs):
for M in D: 
    gradient = compute_gradient(J, M)
    theta = theta - eta * gradient 
eta = eta * 0.95
```

Question? Why decaying the `learning rate`` leads to convergence?

猜你喜欢

转载自www.cnblogs.com/gaoqichao/p/9153675.html

A review of gradient descent optimization methods

An overview of gradient descent optimization algorithms

论文阅读--An overview of gradient descent optimization algorithms

梯度优化算法（gradient descent optimization algorithms）

[论文]An overiview of gradient descent optimization algorithms

An overview of gradient descent optimization algorithms (更新到Adam...

Gradient descent

【Numberical Optimization】5 Conjugate Gradient Methods (zen学习笔记)

Gradient Optimization

Optimization Methods

Deep learning II - II Optimization algorithms - Gradient descent with momentum 动量梯度下降算法

Deep learning II - II Optimization algorithms - Mini-batch gradient descent

梯度下降优化算法概述 An overview of gradient descent optimization algorithms 论文阅读

Learning to learn by gradient descent by gradient descent 笔记

The Gradient Descent---梯度下降（Gradient Descent）

梯度下降（Gradient Descent）

Gradient Descent with Momentum

梯度下降 — Gradient Descent

梯度下降 Gradient Descent

Gradient Descent Vectorization

Gradient Descent (二)

Lecture 3 - Gradient Descent

Gradient descent and others

Stochastic Gradient Descent

Greedy Algorithm and Gradient Descent

Taylor Series and Gradient Descent

Introduction Of Gradient Descent

【Numerical Optimization】5 Conjugate Gradient Methods（发展历史1948-1976）

（2017 ICML）Learning to learn without gradient descent by gradient descent笔记

Optimization Methods-1

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)