Detailed Explanation of Machine Learning Gradient Descent Algorithm

【What is Gradient Descent】

First of all, we can disassemble the gradient descent into gradient + descent, then the gradient can be understood as a derivative (for multi-dimensional it can be understood as a partial derivative), then the combination becomes: derivative descent, then the question comes, what is the derivative descent for? Here I give the answer directly: Gradient descent is used to find the corresponding value of the independent variable when finding the minimum value of a certain function

A certain function in this sentence refers to: loss function (cost/loss function), the direct point is the error function.

Different parameters of an algorithm will produce different fitting curves, which also means different errors.

The loss function is a function whose independent variable is the parameter of the algorithm, and the function value is the error value. So the gradient descent is to find the parameters that the algorithm takes when the error value is minimized.

[Supplementary little knowledge]

There is a type of algorithm in machine learning that generates a curve to fit existing data, so that future data can be predicted. We call this algorithm regression.

There is another type of algorithm that also generates a curve, but this curve is used to divide points into two pieces to achieve classification. We call this algorithm classification.

However, the fitting curves generated by these two algorithms do not completely coincide with the existing points, and there is an error between the fitting curve and the real value. Therefore, we generally use the value of the loss function to measure this error, so the smaller the error value of the loss function, the better the fitting effect. (Simple understanding: the loss function represents the error between the predicted value and the actual value.)

[Why does the gradient drop]

Why does the gradient drop? We can understand it as: find the independent variable corresponding to the minimum value of the error function (loss function).

As shown in the figure below, the corresponding curve of this quadratic function is an error function (loss function), in which the independent variable is the parameter of the algorithm, and the function value is the error value between the fitted curve generated under the parameter and the real value.

(Note: Generally, when you see the formula of gradient descent, it is best to think of the following figure. If you are right, just assume that the error function is so special, it is

The openings are facing upward, they are all smooth, there is only one point where the derivative is 0, and they are all bent once instead of many times. )

[How does gradient descent work]

 Let's still take the above picture as an example, so if you want to find the independent variable x corresponding to the minimum value, how do you find it? Remember our purpose is to find the independent variable x.

Scenario 1:

Monotonous decline, the derivative is negative (the gradient is negative), in order to find the value of the independent variable corresponding to the minimum value of the function (the lowest point of the curve corresponds to the value of x)

How to go?

Of course, it is sliding to the right horizontally, that is, to increase x. At this time, as x increases, the absolute value of the derivative (gradient) decreases (you understand the meaning of gradient descent)

Scenario 2:

Monotonically rising, the derivative is positive (the gradient is positive), in order to find the value of the independent variable of the function (the lowest point of the curve corresponds to the value of x)

How to go?

Of course, it slides horizontally to the left, which means that x decreases. At this time, as x decreases, the absolute value of the derivative (gradient) decreases (that is, gradient descent).

【Summarize】

The meaning of gradient descent can be roughly understood as:

Changing the value of x makes the absolute value of the derivative smaller . When the derivative is less than 0 (case 1), we need to make the current value of x a little bit larger, and then look at its derivative value.

When the derivative is greater than 0 (case 2), we need to reduce the current x value a little bit, and then look at its derivative value.

When the derivative is close to 0, we get the desired independent variable x. That is to say, find the optimal parameters of this algorithm, so that the error between the fitting curve and the real value is the smallest.

Officially speaking, the general meaning is: Gradient descent is to find the parameters that the algorithm takes when the error value is minimized

 Gradient descent is the most commonly used method of calculating the cost function (loss function) in machine learning, and it only needs to calculate the first derivative of the loss function.

Suppose  h(θ ) is the objective function, and  J(θ ) is the loss function,

The gradient (i.e. partial derivative) of the loss function is

This step is to bring h(θ) into the following formula, and then obtain the derivative, which will not be explained in detail here.

Update θ according to the negative direction of the gradient of the parameter θ, that is, the gradient descent algorithm is

Reference article: {High school students can understand} What is gradient descent? - Know almost 

Guess you like

Origin blog.csdn.net/baidu_41774120/article/details/124941746