Intuitive understanding gradient and partial derivatives, the normal vector direction and the like derivative

Blog: blog.shinelee.me | blog park | CSDN

EDITORIAL

Gradient is a fundamental concept in calculus, as well as machine learning mathematical optimization tool solution frequently used (gradient descent algorithm), although often said often heard common, but its details, and the geometric interpretation of the physical meaning it is worth digging a bit, these are not clearly, the gradient became "familiar strangers" only "remember would be finished" when used will inevitably feel at ease, in order "must be assured", this article will attempt to answer the following questions visually,

  • The relationship between gradient and partial derivatives?
  • The relationship between gradient and directional derivative?
  • Why fastest rising gradient direction is the direction of the negative gradient direction for the fastest decline direction?
  • Gradient mode What physical significance?
  • Why contour plot drawn perpendicular to the gradient of a contour?
  • Gradient and total differential implicit function What is the relationship?
  • Gradient why sometimes became normal vector?

Without further ado, the book The True Story. In the whole chapter, "scope", assumed function guide.

Partial derivative

In the blog post "single-variable differential, derivatives and the chain rule blog park | CSDN | blog.shinelee.me ", we reviewed the common elementary functions of the derivative, in a nutshell,

Monohydric derivative function of the rate of change (slope) . Also a function of the derivative is the rate of change in the relationship between the location of the function.

If a multi-function? Compared with partial derivatives .

Partial derivative of a polyhydric function "degraded" to the derivative when monocarboxylic function , where "degraded" means that the value of fixed other variables, leaving only a variable , in turn retains each variable, the \ (N \) membership function has \ ( N \) th partial derivatives.

A binary function as an example, so that \ (Z = F (X, Y) \) , plotted in 3-dimensional coordinate system as shown below,

z = f(x, y)

Respectively fixed \ (Y \) and (X \) \ the value obtained in the black curve of FIG. - "degraded" is a membership function curve of the two-dimensional coordinate system - the partial derivative \ (\ FRAC {\ part {z}} { \ part {x}} \) and \ (\ frac {\ part z } {\ part y} \) are the derivative of the curve (tangential slope) .

partial derivative x

partial derivative y

From the above, a coordinate axis corresponding to a variable, the partial derivative of a function of the derivative in the axial direction along the argument at each coordinate position (tangential slope) .

Partial derivative

Direction 导数

If this is not the direction along the axis direction, but in any direction? Compared with directional derivative . , As shown below point \ (P \) the directional derivative of the position of the direction of the red arrow black slope of the tangent from the link Directional Derivative

Directional Derivative

方向导数为函数在某一个方向上的导数,具体地,定义\(xy\)平面上一点\((a, b)\)以及单位向量\(\vec u = (\cos \theta ,\sin \theta )\),在曲面\(z=f(x, y)\)上,从点\((a,b, f(a,b))\)出发,沿\(\vec u = (\cos \theta ,\sin \theta )\)方向走\(t\)单位长度后,函数值\(z\)\(F(t)=f(a+t \cos \theta, b + t \sin \theta)\),则点\((a,b)\)\(\vec u = (\cos \theta ,\sin \theta )\)方向的方向导数为:
\[ \begin{aligned} &\left.\frac{d}{d t} f(a+t \cos \theta, b+t \sin \theta)\right|_{t=0} \\=& \lim _{t \rightarrow 0} \frac{f(a+t \cos \theta, b+t \sin \theta) - f(a, b)}{t} \\=& \lim _{t \rightarrow 0} \frac{f(a+t \cos \theta, b+t \sin \theta) - f(a, b+t \sin \theta)}{t} + \lim _{t \rightarrow 0} \frac{f(a, b+t \sin \theta) - f(a, b)}{t} \\=& \frac{\partial}{\partial x} f(a, b) \frac{d x}{d t}+\frac{\partial}{\partial y} f(a, b) \frac{d y}{d t} \\=& f_x (a, b) \cos \theta+ f_y (a, b) \sin \theta \\=&\left(f_x (a, b), f_y (a, b)\right) \cdot(\cos \theta, \sin \theta) \end{aligned} \]
上面推导中使用了链式法则。其中,\(f_x (a, b)\)\(f_y (a, b)\)分别为函数在\((a, b)\)位置的偏导数。由上面的推导可知:

该位置处,任意方向的方向导数为偏导数的线性组合,系数为该方向的单位向量。当该方向与坐标轴正方向一致时,方向导数即偏导数,换句话说,偏导数为坐标轴方向上的方向导数,其他方向的方向导数为偏导数的合成

写成向量形式,偏导数构成的向量\(\nabla f(a, b) = (f_x (a, b), f_y (a, b))\),称之为梯度

梯度

梯度,写作\(\nabla f\),二元时为\((\frac{\part{z}}{\part{x}}, \frac{\part{z}}{\part{y}})\),多元时为\((\frac{\part{z}}{\part{x}}, \frac{\part{z}}{\part{y}},\dots)\)

我们继续上面方向导数的推导,\((a,b)\)\(\theta\)方向上的方向导数为
\[ \begin{aligned} &\left(f_x (a, b), f_y (a, b)\right) \cdot(\cos \theta, \sin \theta) \\ =& |((f_x (a, b), f_y (a, b))| \cdot |1| \cdot \cos \phi \\=& |\nabla f(a,b)| \cdot \cos \phi \end{aligned} \]
其中,\(\phi\)\(\nabla f(a,b)\)\(\vec u\)的夹角,显然,当\(\phi = 0\)\(\vec u\)与梯度\(\nabla f(a,b)\)同向时方向导数取得最大值最大值为梯度的模\(|\nabla f(a,b)|\),当\(\phi = \pi\)\(\vec u\)与梯度\(\nabla f(a,b)\)反向时方向导数取得最小值,最小值为梯度模的相反数。此外,根据上面方向导数的公式可知,在夹角\(\phi < \frac{\pi}{2}\)时方向导数为正,表示\(\vec u\)方向函数值上升,\(\phi > \frac{\pi}{2}\)时方向导数为负,表示该方向函数值下降。

至此,方才有了梯度的几何意义

  1. 当前位置的梯度方向,为函数在该位置处方向导数最大的方向,也是函数值上升最快的方向,反方向为下降最快的方向;
  2. 当前位置的梯度长度(模),为最大方向导数的值。

等高线图中的梯度

在讲解各种优化算法时,我们经常看到目标函数的等高线图示意图,如下图所示,来自链接Applet: Gradient and directional derivative on a mountain

Kl27an.png

图中,红点为当前位置,红色箭头为梯度,绿色箭头为其他方向,其与梯度的夹角为\(\theta\)

将左图中\(z=f(x, y)\)曲面上的等高线投影到\(xy\)平面,得到右图的等高线图。

梯度与等高线垂直。为什么呢?

等高线,顾名思义,即这条线上的点高度(函数值)相同,令某一条等高线为\(z=f(x,y)=C\)\(C\)为常数,两边同时全微分,如下所示
\[ \begin{aligned} dz = &\frac{\part f}{\part x} dx + \frac{\part f}{\part y} dy \\=& (\frac{\part f}{\part x}, \frac{\part f}{\part y}) \cdot (dx, dy) \\=& dC = 0\end{aligned} \]
这里,两边同时全微分的几何含义是,在当前等高线上挪动任意一个极小单元,等号两侧的变化量相同\(f(x, y)\)的变化量有两个来源,一个由\(x\)的变化带来,另一个由\(y\)的变化带来,在一阶情况下,由\(x\)带来的变化量为\(\frac{\part f}{\part x} dx\),由\(y\)带来的变化量为\(\frac{\part f}{\part y} dy\),两者叠加为\(z\)的总变化量,等号右侧为常数,因为我们指定在当前等高线上挪动一个极小单元,其变化量为0,左侧等于右侧。进一步拆分成向量内积形式,\((\frac{\part f}{\part x}, \frac{\part f}{\part y})\)为梯度,\((dx, dy)\)为该点指向任意方向的极小向量,因为两者内积为0,所以两者垂直。自然不难得出梯度与等高线垂直的结论。

更进一步地,梯度方向指向函数上升最快的方向,在等高线图中,梯度指向高度更高的等高线

隐函数的梯度

同理,对于隐函数\(f(x,y)=0\),也可以看成是一种等高线。二元时,两边同时微分,梯度垂直于曲线;多元时,两边同时微分,梯度垂直于高维曲面。

即,隐函数的梯度为其高维曲面的法向量

有了法向量,切线或切平面也就不难计算得到了。令曲线\(f(x , y)\)上一点为\((a,b)\),通过全微分得该点的梯度为\((f_x, f_y)\),则该点处的切线为\(f_x (x-a) + f_y (y-b) = 0\),相当于将上面的微分向量\((dx, dy)\)替换为\((x-a, y-b)\),其几何意义为法向量垂直切平面上的任意向量。

小结

至此,文章开篇几个问题的答案就不难得出了,

  • 偏导数构成的向量为梯度;
  • 方向导数为梯度在该方向上的合成,系数为该方向的单位向量;
  • Gradient direction is the direction of the maximum directional derivative, gradient modulus is maximum directional derivative;
  • Inner product result of differentiating the gradient and differential vector
  • Results total differential contour is zero, so that the gradient perpendicular to the contour lines, while a higher height contour point
  • Implicit function can be seen as a contour, which is a higher-order surface gradient (curve) of the normal vector

the above.

reference

Guess you like

Origin www.cnblogs.com/shine-lee/p/11715033.html