The concept of gradient in machine learning of mathematical principles

Derivative:

The derivative of a function at a certain point describes the change trend and rate of change of the function near this point. Take the upward movement of an object as an example, its (position-time), (velocity-time), and (acceleration-time) The curve can be represented by the following figure:

Among them, the (speed-time) curve is the collection of the value of the derivative of the (position-time) curve at each point. In the same way, the (acceleration-time) curve is derived from (speed-time).

Take the picture as an example. At the beginning, the change rate of the position is relatively fast, which reflects that the speed value of the speed curve is relatively large. As time progresses, the change rate of the position curve with time gradually becomes smaller, so that it finally reaches the highest point and level. The x-axis is parallel, the speed gradually decreases to 0, and the marking speed gradually becomes so small that it reaches 0, and the corresponding object reaches the highest point of the upper parabola and becomes 0 at rest.

After the position function reaches the highest point, the rate of speed change starts to increase from small to large, and the corresponding velocity curve gradually increases from 0. The direction of the symbol mark is opposite to the upward throw. Finally, at 4 seconds, the object reaches the original parabolic position. The speed reaches the maximum, which is the initial speed given by the upward throw.

In the same analysis, the speed curve is a straight line with a slope of -2, indicating that the rate of change and trend of the speed is -2, which changes steadily throughout the process. There is a special name in physics to describe it, called acceleration.

So the acceleration curve is a fixed constant value.

Multi Function:

Most elementary mathematics encounters the situation of unary functions, that is, there is only one factor that causes the dependent variable to change, but in most cases, there may not be only one factor that determines the dependent variable. Take the housing price in a certain area as an example. Factors leading to changes in housing prices may also include area, school district division, location, transportation, city size, and so on. Each factor can also be used as an independent variable, and a function with no independent variable is called a multivariate function.

Partial derivative:

  The first said that the derivative is the rate of change of the function value with respect to the independent variable, but a multivariate function has multiple independent variables, and each independent variable can change independently and freely. So how can the rate of change of its function value be obtained?

 Take the housing price as an example. If the housing price is related to the three factors of area, transportation, and school district,

, x stands for area, y stands for transportation, and z stands for school district.
Consider fixing the traffic and school district first, considering the impact of small changes in area on housing prices, expressed as:

In the same way, the area and traffic are fixed, and the area and school district are fixed at a certain point to get the rate of change for traffic and school district respectively.

In this multivariate function, other variables are regarded as constants, and the derivative for a certain variable is calculated according to the derivation rule, called partial derivative. Partial derivative changes only one independent variable at a time, so the calculation process and calculation of unary functions The derivative process is the same.

Having said that, it has actually been connected to the gradient , and it actually represents the gradient.

For example, the function:

    

The result of seeking partial derivative is:

 

   

gradient:

  Baidu Encyclopedia’s definition of gradient, the original meaning of gradient is a vector (vector), which means that the directional derivative of a function at that point takes the maximum value along that direction, that is, the function is along the direction at that point (the gradient of the Direction) changes the fastest and the rate of change is the largest (modulo the gradient)

Let the equation of the cone be:

Among them, a is a variable parameter, here a=2

Then for each point P(x,y) on the surface, the gradient can be defined as

among them

The same

Therefore, at any point of the cone, the gradient vector is

Record it as:

 

 In special cases, the gradient of the point on the intersection of the y=0 section and x=0 section and the circle is

 As shown in the figure below, since the component of the gradient in the other direction is 0, the gradient vector reaches the maximum in the direction of the coordinate axis, which is consistent with intuition.

Intuitively, no matter which point on the cone is triggered, its gradient is the same (different directions), we can verify

and so

=

Therefore, the gradient of any point on the cone is a, which is equal everywhere, including the two lines that intersect the coordinate axis plane calculated above.

Is called a two-dimensional space vector gradient operator,

The v1 vector in the figure below is the gradient vector

The concept of gradient can also be explained from the perspective of directional derivative. The so-called directional derivative refers to the rate at which the function f(x,y) changes along the straight line L. This straight line is actually the corresponding curve on the curved surface in the xoy plane. The vertical plane that completes the projection is called the projection plane. Then, the directional derivative is actually the rate of change of the space curve along its own projection line on the xoy plane. Just as dy/dx represents the rate of change of a curve along the x-axis at a certain point in a two-dimensional plane (that is, the slope of the tangent line).

First look at the definition of the directional derivative:

 

Converted into the following form:

 

The angle theta is the angle between the line l and the x axis.

So, where does the directional derivative get the maximum value when L is pointing?

Assume

Is the unit vector in the direction L,

Then the directional derivative can be written as:

The fai angle is the angle between the projection line L and the gradient. It is not difficult to conclude that when the fai angle is 0, the directional derivative takes the maximum value. In other words, when the directional derivative and the gradient are in the same direction, the rate of change is the largest.

The maximum value of the directional derivative is the gradient:

There is another concept that is very closely related to the gradient. It is the contour line. For a three-dimensional space surface, the contour line is the projection of the intersection line parallel to the XOY plane and the curved surface on the xoy plane, for example, the saddle shown in the figure below The contours of the surface, each group of its contours is a hyperbola.

Let's move:

When the concept of contour lines is adopted, there is a very interesting relationship between contour lines and gradients. In fact, contour lines and gradient directions are perpendicular to each other. Taking three-dimensional surface contour lines as an example, the derivation is as follows:

The contour equation is:

\left\{\begin{matrix} z=f(x,y)\\ z=c \end{matrix}\right.

Then, the projection of the curve on the xoy plane is a curve, and the equation of the curve on the xoy plane is:

f(x,y)=c

The slope of a point P on the curve, according to the implicit function derivation formula, the derivation of x on both sides:

\\ \frac{\partial f(x,y)}{\partial x}*1+\frac{\partial f(x,y)}{\partial y}*\frac{dy}{dx}=0\\ \frac{dy}{dx}=-\frac{\frac{\partial f(x,y)}{\partial x}}{\frac{\partial f(x,y)}{\partial y}}

And the gradient vector is defined as:

\dpi{120} \vec{graduf(x,y)}=\begin{bmatrix} \frac{\partial f(x,y)}{\partial x}, & \frac{\partial f(x,y)}{\partial y} \end{bmatrix}

The gradient direction slope is:

\\ Slop_{gradu}=\frac{\frac{\partial f(x,y)}{\partial y}}{\frac{\partial f(x,y)}{\partial x}}\\ Slop_{gradu}*\frac{dy}{dx}=-1

Therefore, the gradient direction is perpendicular to the tangent direction of the contour.

The gradient of the cone surface calculated above is the same everywhere, which is a, where a=2. For the explanation of the geometric meaning of the above directional derivatives, gradient vectors, contours, and contour normals, I will end this article with a picture:

 The projection of the point on the x-axis:

end!

Guess you like

Origin blog.csdn.net/tugouxp/article/details/109156764