Here, three concepts are discussed: gradient vector, Jacobian matrix, Hessian matrix;
By the independent variable x=(x1,x2,…,xn)T;
Dependent variable:
When ① is one-dimensional f(x),
At this time, the vector formed by its first derivative is the gradient vector g(x);
At this time, the matrix formed by the second derivative is the Hessian matrix;
②When multi-dimensional f(x)=(f1(x), f2(x),…,fm(x))T,
At this time, the matrix formed by its first derivative is the Jacobian matrix;
method/step
-
Gradient vector:
definition:
The objective function f is a single variable, which is a function of the independent variable vector x=(x1,x2,...,xn)T,
The univariate function f calculates the gradient of the vector x, and the result is a vector with the same dimension as the vector x, which is called the gradient vector;
-
Jacobian matrix:
definition:
The objective function f is a function vector, f=(f1(x), f2(x),...fm(x))T;
Among them, the independent variable x=(x1,x2,…,xn)T;
The function vector f calculates the gradient of x, and the result is a matrix; the number of rows is the dimension of f; the dimension of the column number x is called the Jacobian matrix;
Each row consists of the transpose of the gradient vector of the corresponding function;
[Note]: A special case of the gradient vector Jacobian matrix;
When the objective function is a scalar function, the Jacobian matrix is a gradient vector;
-
Hessian matrix:
In fact, the Hessian is the Jacobian of the gradient vector g(x) with respect to the independent variable x:
-
Inner product:
The inner product of vector a and vector b is equal to the length of a times the length of b times the cosine of the included angle;
-
Application of Hessian Matrix in Newton's Method
Newton's method is mainly used in two aspects: finding the roots of the equation; optimization;
1) Solve the equation:
Not all equations have root-finding formulas, or the root-finding formulas are very complicated, making it difficult to solve;
Using Newton's method, it can be solved iteratively;
principle:
Using Taylor's formula, first-order expansion at x0, that is, f(x)=f(x0)+(x-x0)f'(x0);
Solve the equation f(x)=0, that is, f(x0)+(x-x0)f'(x0)=0, solve x=x1=x0-f(x0)/f'(x0),
Because of the first-order expansion of Taylor's formula, f(x)=f(x0)+(x-x0)f'(x0) is approximately equal;
The value of f(x1) is closer to f(x)=0 than f(x0), so iteratively solves;
Deduce xn+1=xn–f(xn)/f'(xn) through iteration, this formula must converge when f(x^∗)=0;
The whole process is as follows:
-
2) Optimization:
In an optimization problem,
Linear optimization can be solved using the fixed point algorithm,
But for nonlinear optimization problems, Newton's method provides a solution;
Assuming that an objective function f is optimized, the problem of finding the minima of the function f can be transformed into the problem of f'=0;
Treat the optimization problem as an equation solving problem (f'=0);
To solve for the roots of f'=0, expand the 2nd order Taylor expansion of f(x):
f(x+Δx)=f(x)+f’(x)Δx+1/2f’’(x)Δx^2;
If and only if Δx approaches 0 infinitely, f(x+Δx)=f(x) reduces these two terms,
For the remainder f'(x)Δx+1/2f"(x)Δx^2=0, take the derivative of Δx (Note: f'(x), f''(x) are both constant terms;
Now the above formula is equivalent:
f’(x)+f’’(x)Δx=0;
Solve:
Δx=−f’(xn)/f’’(xn);
Get the iterative formula:
xn+1=xn−f’(xn)f’’(xn),n=0,1,...
It is generally believed that the Newton method can use the information of the curve itself, which is easier to converge than the gradient descent method (less iterations),
The following figure is an example of minimizing an objective equation, the red curve is solved iteratively by Newton's method, and the green curve is solved by gradient descent method;