Big data course K16 - Spark's gradient descent method

Email of the author of the article: [email protected] Address: Huizhou, Guangdong

 ▲ This chapter’s program

⚪ Understand Spark’s gradient descent method;

⚪ Understand Spark’s family of gradient descent methods (BGD, SGD, MBGD);

⚪ Master Spark’s MLlib to implement SGD;

1. Concept of Gradient Descent Method

1 Overview

When solving the model parameters of machine learning algorithms, that is, unconstrained optimization problems, the gradient descent method is one of the most commonly used methods. Another commonly used method is the least squares method. Here is a brief introduction to the gradient descent method.

The least squares method is suitable when there is an analytical solution to the model equations. If a function does not have an analytical solution, the least squares method cannot be used. At this time, the real solution can only be approximated through numerical solutions (iterative).

There is no analytical solution to the above equation, and each coefficient cannot be expressed by variable expressions.

The gradient descent method is more applicable than the least squares method.

2. What is gradient?

In calculus, find the partial derivatives of ∂ for the parameters of a multivariate function, and write the obtained partial derivatives of each parameter in the form of a vector, which is the gradient.

For example, the function f(x,y) finds the partial derivatives of x and y respectively, and the obtained gradient vector is (∂f/∂x, ∂f/∂y)T, which is referred to as grad f(x,y) or ▽f (x,y).

For the specific gradient vector at point (x0, y0), it is (∂f/∂x0, ∂f/∂y0)T. Or ▽f (x0, y0). If it is a vector gradient of three parameters, it is (∂f /∂x, ∂f/∂y, ∂f/∂z)T, and so on.

3. What is the significance of finding this gradient vector?

In a geometric sense, its meaning is where the function changes the fastest.

Specifically, for the function f(x,y), at the point (x0,y0), the direction along the gradient vector is (∂f/∂x0, ∂f/∂y0) and the direction of T is f(x,y ) is the fastest growing place. In other words, along the direction of the gradient vector, it is easier to find the maximum value of the function.

On the other hand, along the opposite direction of the gradient vector, that is, the direction of -(∂f/∂x0, ∂f/∂y0)T, the gradient decreases fastest, which means it is easier to find the minimum value of the function.

2. Gradient descent method and gradient ascent method

In the machine learning algorithm, when minimizing the loss function, the gradient descent method can be used to iteratively solve the problem step by step to obtain the minimized loss function and model parameter values.

On the other hand, if we need to solve for the maximum value of the loss function, then we need to use the gradient ascent method to iterate.

3. Intuitive explanation of gradient descent method

First, let’s look at an intuitive explanation of gradient descent. For example, we are somewhere on a big mountain. Since we don't know how to get down the mountain, we decide to take one step at a time, that is, every time we reach a position, we find the gradient of the current position, along the negative direction of the gradient, that is, That is, take a step down from the current steepest position, then continue to solve the gradient of the current position, and take a step toward this step along the steepest and easiest position to go downhill. Go down to the bottom of the valley step by step.

From the above explanation, we can see that gradient descent may not necessarily find the global optimal solution, but may be a local optimal solution. Of course, if the loss function is a convex function, the solution obtained by the gradient descent method must be the global optimal solution.

4. Related concepts of gradient descent method</

Guess you like

Origin blog.csdn.net/u013955758/article/details/132567488
Recommended