Gradient Descent Method-Optimization Algorithm-Machine Learning

I. Overview

Gradient descent (GD for short) is a first-order optimization algorithm. Its main purpose is to find the minimum value of the objective function through iteration, or to converge to the minimum value.

The gradient descent method is a type of iterative method that can be used to solve least squares problems (both linear and nonlinear) . When solving the model parameters of machine learning algorithms, that is, unconstrained optimization problems, the gradient descent method and the least squares method are the most commonly used methods. When solving for the minimum value of the loss function, the gradient descent method can be used to iteratively solve the problem to obtain the minimized loss function and model parameter values. In machine learning, two commonly used gradient descent methods have been developed based on the basic gradient descent method, namely the stochastic gradient descent method and the batch gradient descent method ps: To find the maximum value of the loss function, use the
gradient ascent method.

ps: Minimum gradient is a heuristic algorithm, because the next solution uses the result of the previous solution, and the minimum gradient method can be regarded as the simplest single-layer neural network algorithm.

2. Concept

Prediction function: In machine learning, a prediction function is used to fit samples. For example, in the linear regression expression, Y = WiX + B (Wi=W1, W2...WN), Wi represents the weight, and Y represents the output value, which also becomes the predicted value. And X represents the input value, which also becomes the sample feature value. B is called the intercept term here. The purpose is to prevent the trained model from passing through the origin and improve the applicable scope of the model.
Loss function: The loss function is used to evaluate the fit of the trained model. The larger the loss function, the worse the model fitting effect. can be evaluated using least squares

In machine learning, the fundamental purpose of the gradient descent method is to find the minimum value of the loss function in iterations, and adjust the size of each weight through backpropagation until the model fitting degree reaches the specified requirements.
Step size (Learning rate): The step size determines the distance moved in the negative direction of the gradient during the gradient descent process, which corresponds to the distance between the two steepest paths mentioned above. In machine learning Also called the learning rate.
gradient:

  • In a function of a single variable, the gradient is actually the differential of the function, which represents the slope of the tangent line of the function at a given point.
  • In a multivariable function, the gradient is a vector. A vector has a direction. The direction of the gradient points out the fastest rising direction of the function at a given point.

 3. Formulas and solution steps

Insert image description here

 The meaning of this formula is: J is a function of Θ. Our current position is Θ0 point, and we need to go from this point to the minimum point of J, which is the bottom of the mountain. First, we first determine the direction of progress, which is the reverse direction of the gradient, and then walk a certain distance step, which is α. After walking this step, we reach the point Θ1!

1. Determine the gradient of the loss function at the current position 

2. Multiply the gradient of the loss function by the step size to get the distance the current position has dropped.

3. Determine whether all gradient descent distances are less than \varepsilon, if less than \varepsilon, the algorithm terminates, otherwise go to step (4).

4. Update the location and go to the step after the update is completed.

Four,

Several commonly used gradient descent methods

 (1) Full gradient descent algorithm (FG)

Calculate the errors of all samples in the training set, sum them up and take the average as the objective function. The weight vector moves in the opposite direction of its gradient, thereby reducing the current objective function the most. Because when performing each update, all gradients need to be calculated on the entire data set, so the speed will be very slow. At the same time, it calculates the gradient of the loss function with respect to the parameter θ on the entire training data set:

(2) Stochastic gradient descent algorithm (SG)

Since FG needs to calculate all sample errors for each iterative weight update, and there are often hundreds of millions of training samples in actual problems, the efficiency is low and it is easy to fall into the local optimal solution. Therefore, the stochastic gradient descent algorithm is proposed. The objective function of each round of calculation is no longer the error of all samples, but only the error of a single sample. That is, only the gradient of the objective function of one sample is calculated each time to update the weight, and then the next sample is taken and the process is repeated until the loss function value Stop the decline or the loss function value is less than a certain set threshold. This process is simple and efficient, and can usually better prevent update iterations from converging to the local optimal solution. Its iteration form is

3) Mini-batch gradient descent method

The mini-batch gradient descent algorithm is a compromise between FG and SG, which takes into account the advantages of the above two methods to a certain extent. Each time a small sample set is randomly selected from the training sample set, and FG is used to iteratively update the weights on the extracted small sample set. The number of sample points contained in the extracted small sample set is called batch_size, which is usually set to a power of 2, which is more conducive to GPU acceleration. In particular, if batch_size=1, it becomes SG; if batch_size=n, it becomes FG. Its iteration form is:

 


Reference link: https://blog.csdn.net/kevinjin2011/article/details/125299113

Explanation of the Principle of Gradient Descent Algorithm - Machine Learning_zhangpaopao0609's Blog-CSDN Blog_Gradient Descent

Analysis of the Principle of Gradient Descent Method (Vernacular + Formula Reasoning)_Principle of Gradient Descent_Who's in Volume's Blog-CSDN Blog

 

Guess you like

Origin blog.csdn.net/weixin_68479946/article/details/129040682