[Algorithm] Gradient descent algorithm and python implementation

1 Overview

Gradient descent (gradient descent) is widely used in machine learning. Whether in linear regression or logistic regression, its main purpose is to find the minimum value of the objective function through iteration, or converge to the minimum value.
This article will start with a downhill scene, first propose the basic idea of ​​the gradient descent algorithm, then explain the principle of the gradient descent algorithm mathematically, explain why gradients are used, and finally implement a simple example of the gradient descent algorithm!

2. Gradient descent algorithm

2.1 Scenario assumptions

The basic idea of ​​the gradient descent method can be compared to a downhill process.
Assume such a scenario: a person is trapped on a mountain and needs to come down from the mountain (find the lowest point of the mountain, which is the valley). But at this time, the dense fog on the mountain is very heavy, resulting in low visibility; therefore, the path down the mountain cannot be determined, and you must use the information around you to find the way down the mountain step by step. At this time, you can use the gradient descent algorithm to help you go down the mountain. How to do it, first use his current position as the benchmark, find the steepest place at this position, then take a step in the direction of descent, then continue to use the current position as the benchmark, find the steepest place, and walk again Until finally reaching the lowest point; the same is true for going up the mountain, but at this time it becomes a gradient ascent algorithm.

2.1 Gradient Descent

The basic process of gradient descent is very similar to the downhill scene.

First, we have a differentiable function. This function represents a mountain. Our goal is to find the minimum value of this function, which is the bottom of the mountain. According to the previous scenario assumptions, the fastest way to go down the mountain is to find the steepest direction of the current position, and then walk down this direction, which corresponds to the function, which is to find the gradient of a given point, and then move in the direction opposite to the gradient. It can make the function value drop the fastest! Because the direction of the gradient is the fastest changing direction of the function (will be explained in detail later)
, we repeatedly use this method to obtain the gradient repeatedly, and finally reach the local minimum, which is similar to our downhill process. Finding the gradient determines the steepest direction, which is the means of measuring direction in the scene. So why is the direction of the gradient the steepest direction? Next, let's start with differentiation:

2.1.1 Differentiation

There are different angles to look at the meaning of differentiation, the two most commonly used are:

In the function image, the slope of the tangent line at a certain point
The rate of change of the function
Several examples of differentiation:
1. Univariate differentiation, the function has only one variable
Alt
2. Multivariate differentiation, when the function has multiple variables, that is, respectively Differentiate each variable
insert image description here

2.2.2 Gradient

Gradients are really just a generalization of multivariate differentiation.
The following example:
insert image description here
We can see that the gradient is to differentiate each variable separately, and then separate them with commas. The gradient is included with <>, indicating that the gradient is actually a vector.

Gradient is a very important concept in calculus, the meaning of gradient was mentioned before

In a univariate function, the gradient is actually the differential of the function, which represents the slope of the tangent of the function at a given point. In a multivariate
function, the gradient is a vector, and the vector has a direction. The direction of the gradient indicates that the function is in the given point. The fastest rising direction of the fixed point
This also explains why we need to do everything possible to find the gradient! We need to reach the bottom of the mountain, so we need to observe the steepest place at each step, and the gradient just happens to tell us this direction. The direction of the gradient is the direction in which the function rises fastest at a given point, then the opposite direction of the gradient is the direction in which the function drops fastest at a given point, which is exactly what we need. So as long as we keep walking along the direction of the gradient, we can reach the local minimum!

2.3 Mathematical explanation

insert image description here
The meaning of this formula is: J is a function about Θ, our current position is Θ0 point, and we need to walk from this point to the minimum value point of J, which is the bottom of the mountain. First of all, we first determine the direction of progress, which is the reverse of the gradient, and then walk a certain distance, that is, α. After walking this step, we will reach the point Θ1!insert image description here

2.3.1 a

α is called the learning rate or step size in the gradient descent algorithm, which means that we can control the distance of each step through α, so as to ensure that the steps are not too large. Haha, in fact, it is not to go too fast. Missed the low point. At the same time, we must also make sure not to walk too slowly, causing the sun to go down, but we have not yet reached the bottom of the mountain. So the choice of α is often very important in the gradient descent method! α cannot be too large or too small. If it is too small, it may cause delays in reaching the lowest point. If it is too large, it will cause you to miss the lowest point!

2.3.2 The gradient is multiplied by a negative sign

Adding a minus sign before the gradient means going in the opposite direction of the gradient! We mentioned earlier that the direction of the gradient is actually the direction in which the function rises fastest at this point! And we need to go in the direction of the fastest decline, which is naturally the direction of the negative gradient, so we need to add a negative sign here; then if it is uphill, that is, the gradient ascent algorithm, of course there is no need to add a negative sign.

3. Examples

insert image description here
insert image description here

3.2 Gradient Descent for Multivariate Functions

insert image description here
insert image description here

4. Code implementation

4.1 Scenario Analysis

Below we will implement a simple gradient descent algorithm in python. The scenario is an example of a simple linear regression: Suppose now we have a series of points, as shown in the following figure:
insert image description here
insert image description here
insert image description here

4. 2 Code

First, we need to define the dataset and learning rate

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2019/1/21 21:06
# @Author  : Arrow and Bullet
# @FileName: gradient_descent.py
# @Software: PyCharm
# @Blog    :https://blog.csdn.net/qq_41800366

from numpy import *

# 数据集大小 即20个数据点
m = 20
# x的坐标以及对应的矩阵
X0 = ones((m, 1))  # 生成一个m行1列的向量,也就是x0,全是1
X1 = arange(1, m+1).reshape(m, 1)  # 生成一个m行1列的向量,也就是x1,从1到m
X = hstack((X0, X1))  # 按照列堆叠形成数组,其实就是样本数据
# 对应的y坐标
y = np.array([
    3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
    11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)
# 学习率
alpha = 0.01

Next we define the cost function and the gradient of the cost function in the form of a matrix vector

# 定义代价函数
def cost_function(theta, X, Y):
    diff = dot(X, theta) - Y  # dot() 数组需要像矩阵那样相乘,就需要用到dot()
    return (1/(2*m)) * dot(diff.transpose(), diff)


# 定义代价函数对应的梯度函数
def gradient_function(theta, X, Y):
    diff = dot(X, theta) - Y
    return (1/m) * dot(X.transpose(), diff)

Finally, the core part of the algorithm, gradient descent iterative calculation

# 梯度下降迭代
def gradient_descent(X, Y, alpha):
    theta = array([1, 1]).reshape(2, 1)
    gradient = gradient_function(theta, X, Y)
    while not all(abs(gradient) <= 1e-5):
        theta = theta - alpha * gradient
        gradient = gradient_function(theta, X, Y)
    return theta


optimal = gradient_descent(X, Y, alpha)
print('optimal:', optimal)
print('cost function:', cost_function(optimal, X, Y)[0][0])

When the gradient is less than 1e-5, it means that it has entered a relatively smooth state, similar to the state of a valley. At this time, the effect of continuing to iterate is not great, so you can exit the loop at this time!
Run the code, the calculated results are as follows:

print('optimal:', optimal)  # 结果 [[0.51583286][0.96992163]]
print('cost function:', cost_function(optimal, X, Y)[0][0])  # 1.014962406233101

Draw the image through matplotlib,

# 根据数据画出对应的图像
def plot(X, Y, theta):
    import matplotlib.pyplot as plt
    ax = plt.subplot(111)  # 这是我改的
    ax.scatter(X, Y, s=30, c="red", marker="s")
    plt.xlabel("X")
    plt.ylabel("Y")
    x = arange(0, 21, 0.2)  # x的范围
    y = theta[0] + theta[1]*x
    ax.plot(x, y)
    plt.show()


plot(X1, Y, optimal)

The fitted straight line is as follows
insert image description here

5. Summary

So far, the basic idea and algorithm flow of the gradient descent method have been basically introduced, and a simple case of gradient descent algorithm fitting a straight line has been implemented in python!
Finally, let’s go back to the scenario hypothesis proposed at the beginning of the article:
the person going down the mountain actually represents the backpropagation algorithm, the path down the mountain actually represents the parameter Θ that the algorithm has been looking for, and the steepest point at the current point on the mountain The direction is actually the gradient direction of the cost function at this point, and the tool used to observe the steepest direction in the scene is differentiation. The time until the next observation is defined by the learning rate α in our algorithm.
It can be seen that the scene hypothesis and the gradient descent algorithm have completed the correspondence very well!

Part of the content of this article comes from a senior, thank you very much for sharing! Thanks!

Guess you like

Origin blog.csdn.net/weixin_45075135/article/details/127126594